Handbook

Array Processing and Digital Signal Processing



ADVANCED MICRO DEVICES

1

# Advanced Micro Devices

### Array Processing and Digital Signal Processing Handbook

The International Standard of Guality guarantees a 0.05% AQL on all electrical parameters, AC and DC, over the entire operating range.

# INT-STD-500

© 1986 Advanced Micro Devices, Inc.

Advanced Micro Devices reserves the right to make changes in its products without notice in order to improve design or performance characteristics. The performance characteristics listed in this data book are guaranteed by specific tests, correlated testing, guard banding, design and other practices common to the industry. For specific testing details contact your local AMD sales representative. The company assumes no responsibility for the use of any circuits described herein.

901 Thompson Place, P.O. Box 3453, Sunnyvale, California 94088 (408) 732-2400 TWX: 910-339-9280 TELEX: 34-6306

#### Table of Contents

| 1. INTRODUCTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                 |                                           |                                 |                                        | 1                                                                           |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------------------------|----------------------------------------|-----------------------------------------------------------------------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                 |                                           |                                 |                                        |                                                                             |
| 2. NUMBER SYSTEMS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                 |                                           |                                 | •••                                    | 3                                                                           |
| 2.1 Fixed Point Numbers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                 |                                           |                                 |                                        | 3                                                                           |
| 2.1.1 Fixed Point Operations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | •••                                                                                                                                                             |                                           |                                 |                                        | 4                                                                           |
| •                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                 |                                           |                                 |                                        |                                                                             |
| 2.2 Floating Point Numbers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | •••                                                                                                                                                             | •••                                       | •••                             | •••                                    | 7                                                                           |
| 2.2.1 When to use Floating Po                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | int                                                                                                                                                             |                                           | •••                             | •••                                    | 8                                                                           |
| 2.2.2 Floating Point Formats                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                 |                                           |                                 |                                        | 8                                                                           |
| 3. ARRAY PROCESSING ALGORI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | гнмз                                                                                                                                                            |                                           |                                 |                                        | 9                                                                           |
| 3.1 Digital Filters in the Time D<br>3.2 Filtering in the Frequency I                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                 |                                           |                                 |                                        | 9                                                                           |
| the Fourier Transform                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                 |                                           |                                 |                                        | 14                                                                          |
| 3.2.1 Algorithm for Decimation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                 |                                           |                                 |                                        | 14                                                                          |
| 3.2.2 Algorithm for Decimation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                 | •••                                       |                                 | •••                                    | 14                                                                          |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                 |                                           | •••                             | •••                                    |                                                                             |
| 3.2.3 Comparison of FFT and                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                 | •••                                       | •••                             |                                        | 18                                                                          |
| 3.2.4 Inverse Fourier Transfor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | m                                                                                                                                                               | •••                                       | •••                             |                                        | 18                                                                          |
| 3.2.5 Radix 4 FFT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                 |                                           | •••                             |                                        | 18                                                                          |
| 3.2.6 Real-Valued Input Fourie                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | er Transforms                                                                                                                                                   |                                           |                                 |                                        | 20                                                                          |
| 3.3 Magnitude Calculations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                 |                                           |                                 |                                        | 22                                                                          |
| 4. SYSTEM DESIGN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                 |                                           |                                 | ••••                                   | 25                                                                          |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                 |                                           |                                 |                                        |                                                                             |
| 4.1 Array Processor Design ba                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | sed on Am295                                                                                                                                                    | 00 Family                                 | ,                               |                                        | 25                                                                          |
| 4.1 Array Processor Design ba                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                 |                                           |                                 |                                        | 25<br>26                                                                    |
| 4.1.1 Arithmetic                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                 |                                           |                                 |                                        | 26                                                                          |
| 4.1.1 Arithmetic<br>4.1.2 Memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <br>                                                                                                                                                            | <br>                                      | ····<br>···                     | <br>                                   | 26<br>30                                                                    |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                 |                                           |                                 | ····<br>···                            | 26<br>30<br>31                                                              |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <br><br>                                                                                                                                                        | <br><br>                                  | ····<br>····                    | ····<br>····<br>···                    | 26<br>30<br>31<br>34                                                        |
| 4.1.1       Arithmetic          4.1.2       Memory          4.1.3       Addressing          4.1.4       Control          4.1.5       Input/Output                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <br><br><br>                                                                                                                                                    | <br>                                      | ····<br>···                     | ····<br>···                            | 26<br>30<br>31<br>34<br>35                                                  |
| 4.1.1       Arithmetic          4.1.2       Memory          4.1.3       Addressing          4.1.4       Control          4.1.5       Input/Output          4.1.6       Timing Considerations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <br><br><br>                                                                                                                                                    | ····<br>···<br>···<br>···                 | ····<br>····<br>···             | ····<br>····<br>···                    | 26<br>30<br>31<br>34<br>35<br>38                                            |
| 4.1.1 Arithmetic          4.1.2 Memory          4.1.3 Addressing          4.1.4 Control          4.1.5 Input/Output          4.1.6 Timing Considerations          4.1.7 Microcode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | ···<br>···<br>···<br>···                                                                                                                                        | <br><br>                                  | ····<br>···<br>···<br>···       | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38                                      |
| 4.1.1       Arithmetic          4.1.2       Memory          4.1.3       Addressing          4.1.4       Control          4.1.5       Input/Output          4.1.6       Timing Considerations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | ···<br>···<br>···<br>···                                                                                                                                        | ····<br>···<br>···<br>···                 | ····<br>····<br>···             | ····<br>····<br>···                    | 26<br>30<br>31<br>34<br>35<br>38                                            |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | <br><br><br><br>Accumulator                                                                                                                                     |                                           |                                 | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74                                |
| 4.1.1 Arithmetic          4.1.2 Memory          4.1.3 Addressing          4.1.4 Control          4.1.5 Input/Output          4.1.6 Timing Considerations          4.1.7 Microcode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | ···<br>···<br>···<br>···                                                                                                                                        | ····<br>···<br>···<br>···                 | ····<br>···<br>···<br>···       | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38                                      |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <br><br><br>Accumulator<br><br>pring from                                                                                                                       |                                           |                                 | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79                          |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Accumulator                                                                                                                                                     |                                           |                                 | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74                                |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply-<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A                                                                                                                                                                                                                                                                                                                                                                                                                             | Accumulator<br><br>pring from<br>ddressing for                                                                                                                  | ····<br>···<br>···<br>···                 |                                 | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79<br>79                    |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Accumulator<br><br>pring from<br>ddressing for                                                                                                                  | ····<br>···<br>···<br>···                 |                                 | ····<br>···<br>···<br>···              | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79                          |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply.<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Design                                                                                                                                                                                                                                                                                                                                                                                           | Accumulator<br><br>pring from<br>ddressing for<br>                                                                                                              | ····<br>···<br>···<br>···                 |                                 | ····<br>···<br>···<br>···<br>···       | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79<br>79                    |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Design<br>500-kHz Single-Board FFT System                                                                                                                                                                                                                                                                                                                                                         | Accumulator<br><br>pring from<br>ddressing for<br>                                                                                                              | ····<br>···<br>···<br>···                 |                                 | ····<br>···<br>···<br>···<br>···       | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79<br>79                    |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Design<br>500-kHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.                                                                                                                                                                                                                                                                                                                            | Accumulator<br><br>pring from<br>ddressing for<br>n<br>h<br>n                                                                                                   | ····<br>···<br>···<br>···                 |                                 | ····                                   | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79<br>79<br><b>83</b>       |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Design<br>500-KHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.<br>Trim DSP Overhead by Changing S                                                                                                                                                                                                                                                                                         | Accumulator<br>Accumulator<br>pring from<br>ddressing for<br>n<br>hncorporates<br>                                                                              | ····<br>···<br>···<br>···                 |                                 | ······································ | 26<br>30<br>31<br>34<br>35<br>38<br>78<br>79<br>79<br><b>83</b><br>91       |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Desig<br>500-kHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.<br>Trim DSP Overhead by Changing y<br>Sampling Rate, Integrated Circ                                                                                                                                                                                                                                                        | Accumulator<br>Accumulator<br>pring from<br>ddressing for<br>n<br>hncorporates<br>rour<br>cuits                                                                 | ····<br>···<br>···<br>···<br>···          |                                 | ····                                   | 26<br>30<br>31<br>34<br>35<br>38<br>38<br>74<br>79<br>79<br><b>83</b>       |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Desig<br>500-kHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.<br>Trim DSP Overhead by Changing J<br>Sampling Rate, Integrated Cirr<br>DSP Building Blocks Allow Resourd                                                                                                                                                                                                                   | Accumulator<br>Accumulator<br>pring from<br>ddressing for<br>n<br>n Incorporates<br>your<br>cuits<br>co Optimization.                                           | <br><br><br><br><br>This manu             | <br><br><br><br><br><br>        |                                        | 26<br>30<br>31<br>34<br>35<br>38<br>74<br>79<br>79<br>83<br>91<br>99        |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Desig<br>500-kHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.<br>Trim DSP Overhead by Changing y<br>Sampling Rate, Integrated Cir<br>DSP Building Blocks Allow Resourd<br>was originally prepared for and                                                                                                                                                                                 | Accumulator<br><br><br><br>Accumulator<br><br><br>ddressing for<br><br><br>ddressing for<br><br><br>n Incorporates<br><br><br>prover<br>cuits<br>presented at W | <br><br><br><br><br>This manu<br>ZESCON/8 | <br><br><br><br><br><br>        | ······································ | 26<br>30<br>31<br>34<br>35<br>38<br>74<br>79<br>79<br>83<br>91<br>99<br>109 |
| <ul> <li>4.1.1 Arithmetic</li> <li>4.1.2 Memory</li> <li>4.1.3 Addressing</li> <li>4.1.4 Control</li> <li>4.1.5 Input/Output</li> <li>4.1.6 Timing Considerations</li> <li>4.1.6 Timing Considerations</li> <li>4.1.7 Microcode</li> <li>4.1.2 Digital Filter using Multiply</li> <li>5. ARTICLES</li> <li>Record Signal-Processing Rates S<br/>Chip Refinements, Electronics</li> <li>One-Chip Sequencer Shapes up A<br/>Large FFT's, Electronic Desig</li> <li>500-kHz Single-Board FFT System<br/>DSP-Optimized Chips, EDN.</li> <li>Trim DSP Overhead by Changing y<br/>Sampling Rate, Integrated Cim<br/>DSP Building Blocks Allow Resourd<br/>was originally prepared for and<br/>A New Approach to Floating Point I</li> </ul> | Accumulator<br><br><br>pring from<br>ddressing for<br>1<br>n Incorporates<br>your<br>cuits<br>presented at W<br>DSP, 1984 IEEE                                  | <br><br><br><br><br>This manu<br>Press.   | <br><br><br><br><br><br>uscript |                                        | 26<br>30<br>31<br>34<br>35<br>38<br>74<br>79<br>79<br>83<br>91<br>99        |
| 4.1.1 Arithmetic<br>4.1.2 Memory<br>4.1.3 Addressing<br>4.1.4 Control<br>4.1.5 Input/Output<br>4.1.6 Timing Considerations<br>4.1.7 Microcode<br>4.2 Digital Filter using Multiply<br>5. ARTICLES<br>Record Signal-Processing Rates S<br>Chip Refinements, Electronics<br>One-Chip Sequencer Shapes up A<br>Large FFT's, Electronic Desig<br>500-kHz Single-Board FFT System<br>DSP-Optimized Chips, EDN.<br>Trim DSP Overhead by Changing y<br>Sampling Rate, Integrated Cir<br>DSP Building Blocks Allow Resourd<br>was originally prepared for and                                                                                                                                                                                 | Accumulator<br><br><br>pring from<br>ddressing for<br>1<br>n Incorporates<br>your<br>cuits<br>presented at W<br>DSP, 1984 IEEE                                  | <br><br><br><br><br>This manu<br>Press.   | <br><br><br><br><br><br>uscript |                                        | 26<br>30<br>31<br>34<br>35<br>38<br>74<br>79<br>79<br>83<br>91<br>99<br>109 |

| 6. PRODUCT SPECIF        |     | ONS |     | <br> |     |     |
|--------------------------|-----|-----|-----|------|-----|-----|
|                          |     |     |     |      |     |     |
| Am29501A                 | ••• |     |     | <br> |     | 129 |
| Am29509                  | ••• |     |     | <br> |     | 130 |
| Am29510/L510             | ••• |     |     | <br> |     | 131 |
| Am29C516 Family          | ••• |     |     | <br> |     | 132 |
| Am29C517 Family          | ••• |     |     | <br> |     | 132 |
| Am29520/521              |     |     |     | <br> |     | 133 |
| Am29524/525              | ••• |     |     | <br> |     | 134 |
| Am29526/27/28/29         | ••• |     |     | <br> | ••• | 135 |
| Am29540                  | ••• |     |     | <br> |     | 136 |
| Am29323                  | ••• |     |     | <br> | ••• | 137 |
| Am29325                  | ••• |     |     | <br> |     | 138 |
| Am29331                  | ••• |     |     | <br> | ••• | 139 |
| Am29332                  | ••• |     |     | <br> |     | 140 |
| Am29334                  | ••• |     |     | <br> |     | 141 |
| Am2910A                  |     |     |     | <br> |     | 142 |
| Am29C10A                 |     |     |     | <br> | ••• | 143 |
| Am29112                  | ••• |     |     | <br> |     | 144 |
| Am29C116                 |     |     |     | <br> | ••• | 145 |
| Am29PL141                |     |     |     | <br> |     | 146 |
| 7. APPENDIXES            |     |     |     | <br> |     |     |
| Appendix 1               |     |     |     |      |     | 1-1 |
| Appendix 2               |     | ••• | ••• | <br> | ••• | 2-1 |
| Appendix 2<br>Appendix 3 | ••• |     |     | <br> | ••• |     |
| Appendix 3<br>Appendix 4 | ••• | ••• |     | <br> |     | 3-1 |
|                          | ••• |     | ••• | <br> | ••• | 4-1 |
| Appendix 5               | ••• | ••• |     | <br> | ••• | 5-1 |
| Appendix 6               |     |     |     | <br> | ••• | 6-1 |
| Appendix 7               | ••• | ••• |     | <br> | ••• | 7-1 |

#### CHAPTER 1 INTRODUCTION

#### What Is an Array Processor?

In recent years, Array Processing has become an increasingly significant aspect of computing. What once was a mysterious art, is now becoming common practice. Array processing is a form of computing that uses specialized hardware for special results—the array processor. This machine is characterized by its ability to handle many arithmetic computations at high speed. In other words, it is a "number cruncher." However, the specialization goes beyond a powerful hardware arithmetic unit. The machine invariably performs best when the data it is processing is structured in an array, such as a matrix or vector. Hence the name, Array Processor.

#### How Do Array Processors and General-Purpose Computers Differ?

Array processors and general purpose computers differ in a number of important aspects. The general purpose computer is usually of the classical Von Neumann architecture that was implemented by Mauchley and Eckert in the ENIAC machine, which became the first electronic Stored Program General Purpose Digital Computer. This machine had a memory area that allowed instructions and data to be intermixed. Conversely, array processors have separate memories for instructions and data (Harvard architecture). There may in fact be separate data memories for coefficients and variable data. The machines tend to be highly parallel, to allow for simultaneous multiplying and adding in the arithmetic section, while also performing address calculations for retrieving and storing the required data values.

## What Is the Usual Architecture of Array Processors?

While there is no one standard architecture for all array processors, there are a number of characteristics that make the machine recognizable as an array processor. In addition to the above mentioned feature of separate data and instruction memories, pipelining is a common architectural attribute. This technique consists of placing intermediate registers in the data path, breaking up long combinatorial delay paths into shorter paths terminating in registers. The rate at which these registers can be clocked determines how fast the system runs. The register clock rate is obviously dependent upon the delay time of the operation preceding it. Pipelined systems have 'latency', which is the number of clock cycles that passes before the first valid result appears, but this value will generally be low compared to the large number of calculations performed.

Array processors also have dedicated hardware multipliers as part of their powerful arithmetic sections. Hardware floating point arithmetic may also be available, whereas some machines may have shifters to accomodate block floating point, and others may be dedicated to integer operation only. The basic operations that consume most of the processing time in matrix operations are multiplication and addition, as well as accessing the actual data values. Thus the architecture tends to be optimized for these operations.

Microprogrammed architectures tend to be popular in array processing machines. This is the familiar AMD 2900 family "bit-slice" structure that allows a machine to be constructed from "building blocks" of ALU, sequencer, memory access and other required functions, without any predetermined instruction set or architectural constraints that might be imposed by fixed-instruction devices. Microprogramming is the technique of giving a machine its instruction set by means of microinstructions stored in a high speed memory and accessed by a special sequencer. These microinstructions operate at the primitive level of register, bus, and ALU function control. Since the control store is a memory, machine behavior may be modified by changing one or more of the microinstruction bits or words. The technique leads to an extremely flexible, often very high speed, implementation.

#### Where Are Array Processors Used?

If array processors are indeed so powerful, then one might reasonably ask why they don't replace general purpose machines. While array processors do an excellent job of handling their specialized type of problem, they are rather clumsy when confronted with problems that require a lot of branching within the program, which is where the general purpose computer performs so well. So rather than replacing the general purpose computer with an array processor, the tendency is to create an enhanced machine by using an array processor as a peripheral, as an 'accelerator' to the arithmetic-intensive portions of problems.

Array processors have traditionally been implemented in peripheral fashion and as specialized Super Computers that had array processing type architectures. This latter class of machine tended to be extremely expensive, but very powerful. The peripheral array processor was such a desirable approach that a number of companies make their entire livelihood from offering such devices. These peripheral array processors, or accelerators, generally attach to the bus or I/O structure of a scientific minicomputer. Now a tendency is evolving to offer array processors as options on engineering work stations, or even as option boards for personal computers.

Various types of problems lend themselves to array processing solutions, and so they are considered here. We wish to distinguish array processing by the type of hardware solution required, rather than the end application. Thus, robotics controllers, radar and sonar processors, flight simulators, graphics terminals, communications processors, medical analyzers, intelligent vision systems, and speech recognizers are all of interest here. If the problem is solved using multiply-and-add intensive algorithms, and the data and coefficients are structured in arrays, then we have an arrav processing application.

#### **Digital Signal Processing**

Digital signal processing is treated here as a subset of array processing because of the similarities that exist in the hardware and algorithms. The DSP engine tends to be more specific in its design and is generally imbedded in other hardware. DSP also tends to start out with a "live" analog signal, which is A/D converted, processed, and may or may not be converted back to analog. One could put an A/D converter in front of an array processor board and solve DSP problems.

#### What Is in this Book

We have attempted to collect in this book background and applications material that will motivate and guide the independent study of Array Processing. This book does not attempt to be a comprehensive text on the subject, but tends to emphasize the practical aspects of building array processor boards, implementing FFT's and FIR/IIR filters, selecting appropriate hardware, and writing microcode.

Finally, note that the devices whose data sheet summaries are included here are not only suitable for array processing applications. For example, for a fast eight-bit ALU with multiple I/O ports, the Am29501 is ideal. The part certainly won't be aware of what type of problem it is solving, so this should not limit the innovative spirit of the design engineer. For more information on the devices and systems described herein, contact your local AMD sales office, or AMD Headquarter Applications Department (408) 982-6266.

#### CHAPTER 2 NUMBER SYSTEMS

#### 2.1 Fixed Point Numbers

A binary number is an ordered set of binary digits (bits), each of which has a value 0 or 1. Each bit,  $b_i$ , is assigned a binary weight,  $2^i$ , and the value of the number is the sum of the weighted digits.

$$V = \sum_{i} b_{i} * 2^{i}$$

The simplest form of binary number is the unsigned integer. In an N-bit unsigned integer the index, i, ranges from 0 to N-1. The value is given by:

N...1

$$V_{integer} = \sum_{i=0}^{N} b_i * 2^i$$

The range of V is from 0 to  $2^{N}-1$ . This type of number has two obvious limitations; it cannot represent quantities which are negative or fractional. There are many ways to represent negative numbers. The simplest method is to use an unsigned number to represent the magnitude, and a flag to indicate the sign. Not suprisingly this scheme is known as signed-magnitude. In an N-bit representation, the most significant bit,  $b_{N-1}$ , is taken for the flag, with a 0 signifying positive, and a 1 negative. This leaves N-1 bits for the magnitude, giving a range from  $-2^{(N-1)+1}$  to  $+2^{(N-1)-1}$ . The value is given by:

$$V_{sign magnitude} = (-1)^{b} N-1 * \sum_{i=0}^{N-2} b_{i} * 2^{i}$$

An idiosyncrasy of signed-magnitude numbers is that there are two representations for zero, positive and negative. A similar scheme, which shares this characteristic, is one's complement. In one's complement, negative numbers are represented by inverting all bits of an unsigned number representing the magnitude. In order to distinguish positive and negative numbers, the magnitude range is restricted such that it can be expressed in N-1 bits. Thus the most significant bit is 0 for a positive number, and 1 for a negative number. One's complement numbers differ from signed-magnitude numbers only in that the magnitude bits are inverted in negative numbers.

Obviously, the range of numbers represented by a

one's complement number must be the same as for signed-magnitude numbers. The value of a one's complement word may most easily be determined by treating it as an unsigned word, after having inverted all bits if the most significant bit is 1, in which case the value is negative. Alternatively, the value of the magnitude bits may be calculated, and if the most significant bit is 1,  $2^{(N-1)-1}$  subtracted from this value.

V1's complement = -b<sub>N-1</sub> \* 2<sup>N-1</sup> - 1 + 
$$\sum_{i=0}^{N-2}$$
 b<sub>i</sub> \* 2<sup>i</sup>

A simple technique by which negative numbers may be represented without double representation of zero, is to add to the desired value the magnitude of the most negative representable number. This gives a positive number which may be represented in the unsigned format. The value of the number may be obtained by simply reversing this process. This scheme is known as offset binary, or excess-M, where M is the number added. The number M is often, but not always a power of two.

$$V_{\text{excess }M} = -M + \sum_{i=0}^{N-1} b_i * 2^i$$

The special case, where  $M = 2^{(N-1)}$ , has the property that all negative numbers have a most significant bit which is 0, while zero and all postive numbers have a 1. Inverting this most significant bit leads to a scheme known as two's complement, which may be interpreted in several ways. The most significant bit is often treated as a sign flag, as it is in signed-magnitude. If the sign bit is 1, the number is negative and the following serial process is applied to convert it to an unsigned number representing its magnitude. Starting at the least significant bit, the bits are inspected in turn until the first 1 is encountered. This and all lesser significant bits are left unchanged. All more significant bits are inverted. This process operates in both directions, converting a negative number to a positive number of the same magnitude, and vice versa. It may be viewed as inverting the number (one's complement) and incrementing, or as subtracting the number from  $2^{(N-1)}$ .

$$V_{2's \text{ complement}} = - (b_{N-1} * 2^{N-1} - \sum_{i=0}^{N-1} b_i * 2^i)$$

$$= -b_{N-1} * 2^{N-1} + \sum_{i=0}^{N-2} b_i * 2^i$$

Inspection of this formula shows that the magnitude of the sign bit's weighting is consistent with its position if the number were unsigned, but that the weighting is negative. This is an important conclusion, and leads to the most useful interpretation of two's complement numbers; they are identical to unsigned numbers except the most significant bit is weighted negatively. The range of values which can be represented by an N-bit two's complement number is -2(N-1) to +2(N-1)-1.

Any of the above schemes may also be used to represent fractional numbers. This is achieved simply by adopting a convention that the weighting of the least significant bit is  $2^{(-P)}$  rather than 20, and adjusting the other weights accordingly. Conceptually, this locates the binary point P bits from the least significant end. Because such a convention must be chosen in advance, and adhered to for all numbers, this is known as a fixed point number scheme. Other schemes, where the number contains a parameter locating the binary point, are known as floating point.

#### 2.1.1 Fixed Point Operations

Three basic operators are described here: addition, subtraction and multiplication. Only unsigned and two's complement formats are described in detail. They are the two formats most commonly used in fixed point operation. Signedmagnitude and offset binary are commonly used in floating point (see below), and are usually treated in fixed point by converting them to unsigned or two's complement, performing the operation and reconverting. One's complement is not in general usage.

Addition of unsigned numbers is most easily performed by an iterative process known as ripple carry. The iterative block is shown in Figure 2-1.1. This has three inputs, which are equally weighted. Two of these are for operand bits,  $A_i$  and  $B_i$ , and the third is a carry input,  $C_{i-1}$ . The two outputs may be considered as a 2-bit word, representing the number of 1's present at the inputs. The unit weighted bit is the sum output,  $S_i$ , and the bit with weight two is the carry output,  $C_i$ . In cascade, the carry output of one cell becomes the carry input to the next more significant cell, maintaining the equal weighting in that cell, Figure 2-1.2.



The carry input to the least significant cell,  $C_{-1}$ , is assumed to be 0. For the complete multi-bit adder, for each pair of input bits  $A_i$  and  $B_i$ , there is one output bit,  $S_i$ . This new number is the sum. The characteristic that each carry out must be generated in turn before higher bits can be determined leads to the name "ripple carry."

It is also possible to generate the carry inputs to each cell without waiting for the ripple. Each cell is capable of creating a carry into the next cell in two ways; it can generate a carry if both  $A_i$  and  $B_i$  are 1s or it can propagate a carry if either  $A_i$  or  $B_i$  is a 1 and the carry input  $C_{i-1}$  is a 1. This is called "lookahead carry" and can be expressed by the equation:

$$C_i = A_i * B_i + (A_i + B_i) * C_{i-1}$$

 $C_{i-1}$  can be expanded so that the equation for a carry lookahead of two cells is:

 $C_i = A_i * B_i + (A_i + B_i) * (A_{i-1} * B_{i-1} + (A_{i-1} + B_{i-1}) * C_{i-2})$ 

This expansion can continue until the carry is expressed entirely in terms of the inputs and each cell can produce its output without waiting for output from any other cell.

As noted above, the range of an N-bit number is limited, and it is possible to overflow this range when adding two numbers. This may be overcome by making the adder one bit longer, thus doubling the range of the output. In order to provide the additional inputs, the operands are zeroextended. Inspection of the above formulae will show that the value of the unsigned words are not affected by leading 0's. Zero-extension is also used when adding numbers of dissimilar length. This is performed in an adder long enough to handle the larger input, the other being zeroextended. When the operand inputs to a cell are both 0, as in extension to prevent overflow, the sum output is equal to the carry input, and the carry output is identically 0. This fact may be exploited to save hardware, the carry line being used directly as the extra sum.

As the magnitude bits of a two's complement number are weighted the same as in an unsigned number, the same adder scheme may be used. In the most significant cell, the bits all have the same magnitude weighting, but while the carry remains positive, the operand inputs become negative. The cell is required to output a 2-bit two's complement number, that again represents the number of 1's present at the input, but taking into account their sign. As in the unsigned case, the carry would represent the sign-bit of a one bit longer word, which allows for overflow. Appropriately, the sum bit of this cell would be positively weighted as a magnitude-bit of the new word.

Adding two's complement numbers of dissimilar length raises a question. The value of a two's complement number is not affected by adding leading 0's if the existing weights, remain unchanged. However, this gives a number with a negatively weighted bit other than in the most significant position, which does not agree with the definition. This problem is overcome by using signextension, rather than zero-extension.

In sign-extension, additional bits are added which match the sign-bit. It must be stressed, however, that these are not extra sign-bits. A two's complement word can, by definition, only have one sign-bit; the negatively weighted most significant bit. When a new sign-bit is added, the old sign-bit reverts to a magnitude bit. If the number is positive, both old and new sign bits are 0, and the value cannot be changed by altering the weighting of 0's.

If the number is negative, the reversion of the old sign-bit to a magnitude bit changes the value positively by twice its weight (from  $-2^{N-1}$  to  $2^{N-1}$ ). This exactly cancels the contribution to the value from the new, more significant sign-bit, thus leaving the value unchanged. This proceedure may be repeated, making the number as long as necessary.

This process is reversible. If the most significant magnitude bit matches the sign-bit, then the number may be reduced in length by eliminating the sign-bit, the new most significant bit becoming the sign-bit. This fact is exploited to simplify the logic in the adder. If it is known that the sum cannot overflow into the N+1th bit, the sum output of the sign-bit adder cell can be taken as the sign-bit of the N-bit word. This is logically equivalent to the sum output of an unsigned adder cell, which may be used in place of the special cell. The carries, however, are not equivalent. If overflow is possible, it may be protected against by sign-additional adder cell to give the N+1-bit output.

Two's complement subtraction is often described by modifying the adder, and re-interpreting the carry to become a borrow-not (analogous to decimal subtraction). This is confusing. A more straightforward explanation is that the number to be subtracted has its sign changed by being two's complemented, and the result added to the other operand, using the adder as an adder. As described above, two's complementing a number involves one's complementing it and then incrementing it. The one's complement can be performed with inverters, and the incrementation effected by entering an extra 1 in the unused least significant carry input, Figure 2-1.3. If exclusive ORs are used in place of inverters to allow controlled inversion, and the least significant carry also controlled, an adder/subtractor is obtained.

Unsigned subtraction does not really exist. The unsigned operands are converted to two's complement by adding 0 (positive) sign-bits. These are subtracted as above, giving a signed result. If this is positive, the 0 sign-bit can be dropped, reverting to an unsigned format. If it is known in advance that this is case, then it is not necessary to calculate the redundant sign-bit. This is what is sometimes referred to as an unsigned subtractor.

When adding two unsigned numbers with the result expressed in the same number of bits as the inputs, overflow is signified by a carry out of the most significant bit. In unsigned subtraction, a negative result is signalled by the absence of a carry out. In all two's complement operations, overflow is signalled by the carry out of the sign-bit being different from the carry into it. This may be detected with an exclusive OR (XOR).

Unsigned multiplication consists of addition of the partial products formed by weighting the multiplicand by each bit of the multiplier. Weighting by a multiplier bit which is 0 results in a partial product that is zero and weighting by a bit which is a 1 is a left shift of the multiplicand by a number of places equal to the bit position of the multiplier bit. A simple algorithm to perform multiplication is a shift and add procedure in which the multiplicand is shifted one bit position at a time and added to the product based on the corresponding multiplier bit and using the rules for adding operands of unequal lengths as stated in the discussion of addition above. Multiplication of two's complement numbers is the same when the rules for two's complement are applied, i.e., when the partial products are sign-extended instead of zero-extended for the shifted addition and the partial product from the multiplier sign bit is weighted by  $-b_{N-1}*2^{N-1}$  instead of  $+b_{N-1}*2^{N-1}$ .

Multiplication of unsigned fractions is identical to multiplication of unsigned integers. Placement of the binary point is a matter of interpretation and does not require alteration of the algorithm, However, multiplication of two's complement fractions results in a product with different bit weights than the operands as shown in Figure 2-1.4. To obtain the same bit weights in the product as the operands it is necessary to shift the product left one position as shown in Figure 2-1.5. A product which has the same format as the operands is obtained when the less significant product is truncated or causes rounding of the more significant part. One problem with this shift is the possibility of overflow when both multiplier and multiplicand are equal to -1.0. The product is +1.0 which cannot be represented in two's complement fractional notation.

A common technique used to speed up multiplication is the Booth algorithm which examines consecutive bits of the multiplier. Whenever the consecutive multiplier bits change from a 1 to a 0, the multiplicand is added to the product with the proper bit weighting. When the bits change from a 0 to a 1, the multiplicand is subtracted; no operation is necessary when the bits are the same. This algorithm is based on the identities 3 = 2+1 = 4-1, 7 = 4+2+1 = 8-1, etc. which allow two operations (one addition and one subtraction) to replace a potentially larger number of operations (two additions, three additions, etc.). The worst case is a multiplier with alternating 1s and 0s for which there is the same number of operations as the add and shift algorithm when the multiplier is all 1s, i.e., one operation per multiplier



bit. A modification to the algorithm examines pairs of multiplier bits as shown in Figure 2-1.6 This modification is usually done to modularize hardware into a cell which can be repeated for each pair of multiplier bits.

#### 2.2 What Is A Floating-Point Number?

The numbers one encounters everyday, such as 12, 34.56, 0.0789, etc., are known as fixed-point numbers, because the decimal point is in a fixed position. Such numbers are fairly closely matched in magnitude and within about ten orders of magnitude from unity. Examples of such numbers are found in bank accounts, unit prices of store items and paychecks.

In scientific notation applications, the numbers encountered can be very large. Avogadro's number expressed in fixed-point notation is approximately

602,250,000,000,000,000,000,000.

A scientist may also use Planck's constant which would be approximately

#### 0.0000000000000000000000006626196

erg.sec. in fixed-point notation. These examples demonstrate the undesirability of writing fixed-point notation to represent numbers such as Avogadro's number and Planck's constant.

When a scientist writes the value of Avogadro's number, he writes 6.0225 x  $10^{23}$ . Similarly, he would express Planck's constant as 6.626196 x  $10^{-27}$  erg.sec.

The number 6.0225 x  $10^{23}$  is thus observed to consist of four parts:

**Sign.** The sign of the number (+ or –). The plus sign is usually assumed when no sign is shown.

Mantissa. Sometimes also known as the fraction. The mantissa describes the actual number. In the example, the mantissa is 6.0225.

**Exponent.** Sometimes also known as the characteristic. The exponent describes the order of magnitude of the number. In the example, the exponent is 23.

**Base.** Sometimes also known as the radix. The base is the number base in which the exponent is raised. In the example, the base is 10.

The parts of a floating-point number can then be represented by the following equation:

$$F = (-1)^{S} \cdot M \cdot B^{E}$$

where

- F = Floating-point number
- S = Sign of the Floating-point number, so thatS = 0 if the number is positive and S=1 if thenumber is negative
- M = Mantissa of the floating-point number
- B = Base of the floating-point number
- E = Exponent of the floating-point number.

| ^ | 0   | ^   | NOP        |
|---|-----|-----|------------|
| v | v   | U   | NOF        |
| 0 | 0   | 1   | <b>x</b> 1 |
| 0 | 1   | 0   | <b>x</b> 1 |
| 0 | 1   | 1   | x 2        |
| 1 | 0   | 0   | -x2        |
| 1 | 0   | 1   | -x 1       |
| 1 | 1   | 0   | -x 1       |
| 1 | 1   | 1   | NOP        |
|   | Fig | ure | 2-1.6      |



#### 2.2.1 When Should Floating Point Be Used?

Although floating-point numbers are useful when numbers of very different magnitude are used, they should not be used indiscriminately. There is an inherent loss of accuracy and increased execution time for floating-point computation on most computers. Floating-point computation suffers the greatest loss of accuracy when two numbers of closely matched magnitude are subtracted from each other, or two numbers of opposite sign—but almost equal magnitude—are added together. Therefore, the Associative Law in arithmetic

$$A + (B + C) = (A + B) + C$$

does not always hold true if B is of opposite sign to A and C and very similar in magnitude to either A or C.

In most computers, hardware floating-point multiply and divide takes approximately the same amount of execution time as hardware fixed-point multiply and divide, but hardware floating-point add and subtract usually takes considerably more time than hardware fixed-point add and subtract. If the computer lacks floating-point hardware, all floatingpoint computations will consume more CPU time than fixed-point computations.

#### 2.2.2 Floating-Point Formats

The following three number bases are commonly used in floating-point number systems:

- 1. Binary—The base is 2.
- 2. Binary Code Decimal-The base is 10.
- 3. Hexadecimal-The base is 16.

The two types of exponents used in floating-point number systems are the biased exponent and the unbiased exponent. An unbiased exponent is a two's complement number. An exponent said to be biased by N (or excess N notation) means that the coded exponent is formed by adding N to the actual exponent in two's complement form. Any overflow generated from the addition is ignored. The result becomes an unsigned number. Most common floating-point systems use a biased exponent because it simplifies floating-point During floating-point computation, hardware. arithmetic operations such as add and subtract need to be performed on the exponent of the operands. If a biased exponent is used, the arithmetic logic unit (ALU) needs only to perform unsigned arithmetic. If an unbiased exponent is used, the ALU must perform two's complement arithmetic; and overflow conditions are more difficult to detect.

Floating-point numbers must always be presented to the computer in "normalized" form (i.e., the most significant digit of the mantissa is always non-zero, except if the number is zero). For a binary floatingpoint system, this means that the leading binary bit of the mantissa is always 1 (except when the number is zero). In some floating-point number systems, this 1 bit is suppressed on input or output to the floating-point processor. The saved bit can be used for one more bit of precision or one more bit of exponent range.

The IEEE and DEC floating-point formats are covered in significant detail in AMD's data sheet.

#### CHAPTER 3 ARRAY PROCESSING ALGORITHMS

#### 3.1 DIGITAL FILTERS IN THE TIME DOMAIN

Digital filters are devices which accept a sequence of digital data samples and produce a modified sequence which is a linear combination of previous inputs and outputs. The sequence is usually data which has been obtained from a continous analog waveform that has been digitized at uniform time intervals. Uniform sampling is used since it simplifies analysis of the system performance and does not create any undesirable side effects or limitations. The usual notation when dealing with the digitized sequences is to associate each sample with a polynomial term, e.g.,

$$\{X_n\} = \dots a, b, c, d, e \dots \text{ for } n = -2, -1, 0, 1, 2$$
  
=  $\dots az^{-2} + bz^{-1} + cz^0 + dz^1 + ez^2 \dots$ 

This polynomial is the Z-transform of the original sequence and can also be written:

$$X(z) = \sum x_n z^{-n}$$

The relationship between the input and output of a digital filter is:

$$y_n = \sum x_m h_{n-m}$$

where  $\{x_m\}$  and  $\{y_n\}$  are the input and output sequences and  $\{h_{n-m}\}$  is the impulse response of the filter. When  $\{h_{n-m}\}$  has a finite number of non-zero terms, the filter has a Finite Impulse Response (FIR). Otherwise it is described as having an Infinite Impulse Response (IIR).

If we delay an input sequence by one sample period, the result is the same as the input sequence multiplied by  $z^{-1}$ . The combination of previous inputs and outputs of a digital filter can therefore be represented by:

$$y_n = \sum a_k x_{n-k} - \sum b_k y_{n-k}$$

where the first sum is a linear combination of the current and previous inputs of the sequence and the second sum is a combination of the previous outputs. A recursive filter of this type has an infinite impulse response. When the coefficients of the second sum are all zero, the filter becomes non-recursive and the impulse response have a finite number of terms (an FIR filter). A Z-transform of each sequence in the digital filter can be written:

$$Y(z) = \sum y_{n} z^{-n}$$
$$= \sum \left[ \sum a_{k} x_{n-k} - \sum b_{k} y_{n-k} \right] z^{-n}$$

Using the equality  $X(n-k) = z-k_{X(n)}$  this can be rewritten and rearranged as:

EQ. 2

$$Y(z) = \sum a_k z^{-k} X(z) - \sum b_k z^{-k} Y(z)$$

The transfer function of the filter is therefore:

$$H(z) = \frac{Y(z)}{X(z)} = \frac{\sum a_{k} z^{-k}}{1 + \sum b_{k} z^{-k}}$$

When working with linear time-invariant systems, filters can be cascaded (either serially or in parallel) to obtain a desired transfer function as shown in Figure 3-1.1. A second order filter has a transfer function of the form:

$$H(z) = \frac{a_0 z^0 + a_1 z^{-1} + a_2 z^{-2}}{1 + b_1 z^{-1} + b_2 z^{-2}}$$
$$= A \frac{z^2 + Dz + E}{z^2 + Bz + C}$$

This is a basic building block for digital filters and is called a biquadratic filter with a gain of A. By cascading a number of these sections and selecting the proper values for the coefficients, it is possible to produce any frequency response with the same structure, e.g., high-pass, low-pass, band-pass or band-stop filter. Computer programs are available to assist the filter designer in choosing the proper coefficients for control of filter skirts, ripple, etc. These programs also allow the designer to select which parameter to compromise when the desired performance is unrealisable.

Implementation of a digital filter is frequently a hardware embodiment of the equations expressed above. Adders and multipliers perform the arithmetic on data that have been appropriately delayed in registers. A filter section might use an adder for each addition and a multiplier for each multiplication or it might sequence data through a single unit. Cascading of sections could also take the form of multiple hardware sections or could be

EQ. 1

a single section with programmable coefficients that operates iteratively on the data. A block diagram of a second order filter section is shown in Figure 3-1.2. If the number of terms in the sums is the same (as is shown in the figure), it is possible to rearrange the blocks to reduce the number of delay registers to a single set (sometimes called the canonical form). This is done by factoring the transfer function to separate the numerator and denominator:

$$H(z) = \sum a_{k} z^{-k} * \frac{1}{1 + \sum b_{k} z^{-k}}$$

The numerator is implemented by the hardware shown in Figure 3-1.3 and the denominator by that in Figure 3-1.4. When the two halves are cascaded, a single set of delay registers can be used as shown in Figure 3-1.5.

Implementation of an FIR filter is a simple sum-ofproducts as shown in the block diagram in Figure 3-1.6 Since the number of terms in an FIR filter is significantly larger than the number in an IIR filter with equivalent performance, most implementations use algorithms that reduce the computational requirements wherever possible by taking advantage of symmetry or by decimating the





data. Symmetry trades a less costly computation (addition) for a more expensive one (multiplication). Since the coefficients for an FIR filter are usually symmetrical

#### $(a_k = a_{n-k}),$

it is possible to reduce the number of multiplications required by adding data with the same coefficient before multiplying by the common coefficient as shown in Figure 3-1.7.

Decimation is the process of reducing the sampling rate when the frequency response is limited. For example a signal containing only frequencies below 5 kHz and sampled at 20 kHz can be decimated by 2. Also, a band pass filter can be converted to a low-pass filter and the resulting data could then be decimated with a corresponding reduction in computation requirements. Decimation by N is accomplished by only processing every Nth data sample and discarding the remaining data samples. This reduces the computations of an FIR filter by a factor of N. Decimation does not reduce the number of computations for an IIR filter since the recursive nature of the filter requires that all the previous outputs be computed for the current output.

Digital filters are also used for interpolation. This is an increase in the sampling rate (the opposite of decimation) and is used to produce the data points between the actual samples. The interpolation procedure is to insert N–1 zeros between each data sample and pass the data through a low-pass filter. Since the data insertion is only conceptual (only calculations with non-zero data points are performed), an FIR filter has the same computational advantage over an IIR filter for this operation as for decimation.

Since both IIR and FIR filters can be designed for equivalent performance, it may not be obvious why the FIR filter with its greater number of computations would ever be used. One important characteristic of an FIR filter is that it can be designed to have linear phase response which is not realizable with either IIR or analog filters. A second advantage of an FIR filter is that roundoff errors are limited and easily controlled. The recursion in an IIR filter means that roundoff errors are cumulative.

Another advantage of the FIR implementation of a filter is the ability to partition the processing for the filter into data sets for multiple filter circuits and



thus to increase the throughput of the system through parallel processing. This is not possible for IIR filters since the previous outputs are required for calculation of the current output. Although a cascaded IIR filter achieves a similar result, i.e., the processing can be distributed to different pieces of hardware operating in parallel, the complexity of the filter, or lack thereof, limits the potential gain. An FIR filter could use ten circuits to obtain a tenfold throughput gain or twenty circuits to obtain a throughput gain of twenty. An IIR filter might contain five secondorder sections and is limited to a potential throughput increase of five.

Although stability of the FIR filter is sometimes cited as an advantage, it is not difficult to design a stable IIR filter, but the designer must pay attention to the warnings from the filter design program.



#### 3.2 FILTERING IN THE FREQUENCY DOMAIN—FOURIER TRANSFORM

Fourier transforms are used to change time domain data into the frequency domain when the processing required involves the measurement of spectra. The processing in the frequency domain is so much simpler compared to that in the time domain to warrant the additional transformations between time and frequency domains. Fourier transforms are performed on a limited series, usually a power of 2 to take advantage of the efficiencies of the fast algorithm described below. The Discrete Fourier Transform (DFT) of a limited series { x (n) }, 0 \le n \le N-1 is defined by:

$$X(k) = \sum_{n=0}^{N-1} X(n) * e^{j\frac{2\pi n k/N}{n}} \text{ for } k = 0, 1, ..., N-1$$

Although the transform can be calculated directly by performing the summation of all the products in the equation, the number of calculations becomes prohibitive as the transfform size increases. Defining  $\{x (n)\}$  as a series of N complex numbers, there are  $(N-1)^2$  complex multiplications and N(N-1) complex additions to be performed. The following table illustrates the rapid increase in processing requirements as N increases.

| N    | Multipli-<br>cation | Addi-<br>tion |
|------|---------------------|---------------|
| 64   | 3969                | 4032          |
| 256  | 65025               | 64280         |
| 1024 | 1046529             | 104755        |

These numbers represent complex operations which translate to four real multiplies and two real adds for each complex multiply and two real adds for each complex addition. Fortunately, it is possible to reduce the number of calculations necessary for the transform by organizing the sequence in which the calculations are performed. Because of the periodic nature of  $e^{(-j*2\pi*n*k/N)}$ . product terms in the series appear in more than one summation and need not be recalculated for every occurrence. The Fast Fourier Transform removes redundant calculations by (FFT) repeatedly separating the initial series into two half series until the operation is reduced to a calculation with two data points. There are two different algorithms for doing this-decimation in time and decimation in frequency.

Since the equations defining the DFT and FFT do not contain time or frequency as a variable, the calculations can be applied equally well to data from audio sources to microwave frequencies. For example, if N =1024 data samples are taken at 1msec intervals, each frequency output will represent

#### k \*1kHz/1024

and if the sample interval is 1 $\!\mu\text{sec},$  then each output will represent

k \*1MHz/1024.

The spectrum is also continuous so that the filter output 0 is adjacent to filter output 1023 and it is sometimes convenient to think of the filters as being positive and negative, i.e., filters 0-1023 become filters 0 to 511 and -512 to -1.

Although the FFT is more efficient when computing a full spectrum, some applications are only interested in a limited band of frequencies. In these cases the FFT algorithm calculates many outputs which are not of interest and discarded. In these applications, calculating a small number of DFTs may be more efficient than calculation of the entire spectrum with an FFT. Examples of this restricted band of interest are carrier detection and target tracking in radar or sonar.

#### 3.2.1 Algorithm for Decimation in Time

Assuming that N is an integral power of two (N =  $2^{r}$ ), we can separate { x (k) } into two half series:one sequence with even indices

 $\{x_{\vdash}(n)\} = \{x(2n)\}$ 

one sequence with odd indices

 ${x_{O}(n)} = {x(2n+1)}.$ 

The DFTs of the sequences constructed over

{ x E(n) } and { x O(n) } are  
{ X E(k) } and { X O(k) } with  
X\_E(k) = 
$$\sum_{n=0}^{(N/2)-1} X(2n) * e^{j^{2\pi n k/(n/2)}}$$
  
X\_O(k) =  $\sum_{n=0}^{(N/2)-1} X(2n+1) * e^{j^{2\pi n k/(n/2)}}$ 

The notation is simpler if we use  $W_N = e^{(-j*2\pi/N)}$ and the transform of the original series becomes the sum of the transform of the series of even indices and the transform of the series of odd indices multiplied by a twiddle factor.

$$X(k) = \sum_{n=0}^{(N/2)-1} X(2n) W_N^{2nk} + \sum_{n=0}^{(N/2)-1} X(2n+1) W_N^{(2n+1)k}$$

$$= X_{E}(k) = W_{N}^{\kappa} * X_{O}(k)$$

Since  $W^2_n\,nk$  is periodic with a length of N/2 and that  $W^k_n+N/2=-W^k_n$  we can write

$$X(k+N/2) = X_F(k+N/2)+W_N^{k+N/2} X_O(k+N/2)$$

$$= X_{E}(k) - W_{N}^{k} * X_{O}(k)$$

This calculates X(k) and X(k+N/2) in terms of X<sub>E</sub>(k) and X<sub>O</sub>(k) which is represented graphically by a simple butterfly:



This butterfly is usually simplified to:



The sequences

 ${x (2n)}$  and  ${x (2n + 1)}$  can each be separated into odd and even again:

 $\{x (2n)\}$  can be decomposed into  $\{x (4n)\}$  and  $\{x (4n + 2)\}$ 

 ${x (2n + 1)}$  can be decomposed into  ${x (4n + 1)}$  and  ${x (4n + 3)}$ 

This results in X(k), X(k+N/4), X(k+N/2) and X(k+3N/4) being calculated from the transforms of each quarter series  $X_{Q0}(k)$ ,  $X_{Q1}(k)$ ,  $X_{Q2}(k)$  and  $X_{Q3}(k)$  in two stages as shown in the following equations:

$$X_{E}(k) = X_{Q0}(k) + W_{N}^{2k} * X_{Q2}(k)$$
$$X_{O}(k) = X_{Q0}(k) W_{N}^{2k} * X_{Q2}(k)$$

$$\begin{split} X_{\rm E} \left( k{+}N{/}4 \right) &= X_{\rm Q1} \left( k \right){+}W_{\rm N}^{2k_{\star}} X_{\rm Q3} \left( k \right) \\ X_{\rm O} \left( k{+}N{/}4 \right) &= X_{\rm O1} \left( k \right) {-}W_{\rm N}^{2k_{\star}} X_{\rm O3} \left( k \right) \end{split}$$

Because  $N = 2^r$  we can continue decomposing "r" times until the summations in the butterfly calculation become the terms x(k). For N = 8 samples we obtain the results shown in Figure 3-2.1.

The input data in Figure 3-2.1 was ordered according to the needs of decomposition into odd and even series which has the property that the results occur in natural order. On the other hand, if the original data was in chronological order, the results would be in bit-reversed order. Bit-reversal refers to the binary representation of the sequential position of the data values. For a sequence  $N = 2^r$  we read the binary code (r bits) of the initial position backwards and obtain the binary code of the bit-reversed position.

For N = 8 the values are:

| Binary<br>Value | Bit-Reversed<br>Value |  |
|-----------------|-----------------------|--|
| 000             | 000                   |  |
| 001             | 100                   |  |
| 010             | 010                   |  |
| 011             | 110                   |  |
| 100             | 001                   |  |
| 101             | 101                   |  |
| 110             | 011                   |  |
| 111             | 111                   |  |

Note that the input data order when arranged for decimation in time is bit-reversed order so that the reordering process is the same for either input or output. Figure 3-2.2 shows the butterflies when the input data is ordered (and the results are bit-reversed) for N = 8 samples.

#### 3.2.2 Algorithm for Decimation in Frequency

The decimation in frequency algorithm decomposes the initial series into two consecutive series.

$$X_1$$
 (n) = X(n)  
for n = 0, 1, ..., N/2 - 1  
 $X_2$  (n) = X(n+N/2)

Using the preceding notation, the transform of the sequence can be written:

$$\begin{split} X(k) &= \sum_{n=0}^{(N/2)-1} X(n) * W_N^{nk} + \sum_{n=N/2}^{n-1} X(n) * W_N^{nk} \\ &= \sum_{n=0}^{(N/2)-1} X_1(n) * W_N^{nk} + \sum_{n=0}^{(N/2)-1} X_2(n) * W_N^{(n+N/2)k} \\ &= \sum_{n=0}^{(N/2)-1} \left[ X_1(n) + X_2(n) * W_N^{k,N/2} \right] * W_N^{nk} \end{split}$$

Since  $W_n^k N/2$  is equal of -1 when k is odd and equal to +1 when k is even, the even and odd samples can be written:

$$\begin{split} X(2k) &= \sum_{n=0}^{(N/2)-1} \left[ X_1(n) + X_2(n) \right]^* W_N^{2nk} \\ X(2k+1) &= \sum_{n=0}^{(N/2)-1} \left[ X_1(n) - X_2(n) \right]^* W_N^{(2k+1)n} \end{split}$$

Using  $W_n^2 n/k = e^{-j2\pi * 2nk/N} = e^{-j2\pi * nk/N/2} = W_{N/2}^{nk}$  the equations become:

$$\begin{split} X(2k) &= \sum_{n=0}^{(N/2)-1} \left[ X_1(n) + X_2(n) \right]^* W_{N/2}^{nk} \\ X(2k+1) &= \sum_{n=0}^{(N/2)-1} \left[ X_1(n) - X_2(n) \right]^* W_N^n * W_{N/2}^{nk} \end{split}$$



Figure 3-2.1



These equations show that the samples of the DFT can be obtained from the N/2 point DFTs of the sequences f(n) and g(n) where

$$f(n) = X_1(n) + X_2(n)$$
$$g(n) = \left[ X_1(n) - X_2(n) \right]^* W_N^n$$

This can be represented by the following butterfly:



As in the decimation in time algorithm we can decompose each series into two sequences repeated "r" times. For N = 8 samples we get the graphic of Figure 3-2.3.

Data order is the same as for decimation in time; we have to order the initial values in bit-reversed order (see Figure 3-2.4 for N = 8) in order to get the results in natural order.

The decimation in time and decimation in frequency algorithms are very similar and call for the same number of arithmetic operations. The calculations can be made in-place (results replace the operands used to calculate them) but we must arrange the initial sequence in bit-reversed order to get the results in natural order or we can leave



the initial sequence in chronological order and rearrange the resulting sequence. Reordering the inputs or the results are both performed according to the same rule, i.e., bit-reverse the binary representation of the position. It is also possible to avoid the rearrangement by using a double buffer memory and placing the butterfly results in the alternate buffer instead of in-place. The buffers would be swapped at the end of each FFT iteration to make the output of the previous iteration the current input data.

#### 3.2.3 Comparison of FFT and DFT

The algorithms of the Fast Fourier Transform (FFT) requires  $r = log_2 N$  steps. At each step there are N complex additions and N/2 complex multiplications. Therefore, the total number of operations is:

#### N/2 log<sub>2</sub> N multiplications N log<sub>2</sub> N additions

The following table compares these results to the ones of the Discrete Fourier Transform.

|      | Discrete<br>Fourier<br>Transform |         | m    |       |
|------|----------------------------------|---------|------|-------|
| N    | (*)                              | (+)     | (*)  | (+)   |
| 64   | 3969                             | 4032    | 192  | 384   |
| 256  | 65025                            | 65280   | 1024 | 2048  |
| 1024 | 1046529                          | 1047552 | 5120 | 10240 |

As mentioned previously, this efficiency only applies when a large portion of the spectrum is of interest. If only three frequency outputs were required, the DFT table becomes:

| Ň    | (*)  | (+)  |  |
|------|------|------|--|
| 64   | 189  | 189  |  |
| 256  | 765  | 765  |  |
| 1024 | 3069 | 3069 |  |

#### 3.2.4 Inverse Fourier Transform

The inverse Fourier transform of a signal transforms data from the frequency domain to the time domain and is calculated by:

$$X(n) = 1/N \sum_{n=0}^{N-1} X(k) * e^{+j^{2\pi nk/N}}$$

The similarity to the forward transform allows us to use the same algorithms for the inverse transform by changing the sign of the angle from (- $2\pi$ +n\*k/N) to ( $2\pi$ \*n\*k/N) or W<sub>N</sub><sup>nk</sup> to W<sub>N</sub><sup>-nk</sup> The scale factor 1/N is typically incorporated in the gain of the system and ignored for the purposes of calculation of the transform.

#### 3.2.5 Radix 4 FFT

It is possible to further reduce the number of calculations required to perform the FFT by using a radix 4 algorithm. If the number of samples N is represented by  $N = 4^x$ , for x = a positive integer, the butterflies of the preceding algorithms can be rearranged in groups of four. For the decimation in time algorithm with N = 4 we have to perform the operations shown by Figure 3-2.5. Figure 3-2.6 shows the same thing for N = 16. Figures 3-2.7 and 3-2.8 show the requirements for reordering the data for N = 16 but this time using the algorithms of decimation in frequency. The improvement in efficiency is shown in the following table:



Figure 3-2.5





| Radix 2 |      |       | Ra   | dix 4 |
|---------|------|-------|------|-------|
| N       | (*)  | (+)   | (*)  | (+)   |
| 64      | 192  | 384   | 144  | 384   |
| 256     | 1024 | 2048  | 768  | 2048  |
| 1024    | 5120 | 10240 | 3840 | 10240 |

As shown in the table the advantage of the radix 4 algorithm is in reducing the number of multiplications by 25%.

# 3.2.6 Real-valued Input Fourier Transforms

Although the FFT processes complex data, the input to a system is frequently real data only. One way to process real-valued input data is to zero-fill the imaginary data and perform the FFT as though the inputs were complex. However, this is wasteful of memory and processing time since half the outputs are discarded when the FFT is complete. One way to overcome this inefficiency is to combine two series of real inputs and process them as a complex number series. This produces two spectra in the processing time required for a single zero-filled data set. However, it does require an additional processing step to separate the outputs into two spectra. If

is treated as the real part of the data and  $\{x_2(n)\}$ 

is treated as the imaginary part, the two spectra can be separated by calculating:

$$X_{1} (k) = 1/2 \left\{ \left[ X_{R} (k) + X_{R} (N - k) \right] + j \left[ X_{1} (k) - X_{1} (N - k) \right] \right\}$$
$$X_{2} (k) = 1/2 \left\{ \left[ X_{1} (k) + X_{1} (N - k) \right] + j \left[ X_{R} (k) - X_{R} (N - k) \right] \right\}$$

This additional processing step is much less than the time that would be required to perform a second FFT.

A second algorithm takes a single series and treats every even data point as the real part and every odd data point as the imaginary part of the data. After performing an N/2-point FFT, the outputs must be calculated using the following:

$$\begin{aligned} X(k) &= 1/2 \quad \left[ X_{R}(k) + X_{R}(n-k) + \cos \hat{\Theta} (X_{1}(k) + X_{1}(N-k)) \right] \\ &- \sin \Theta (X_{R}(k) - X_{R}(N-k)) + j/2 \quad \left[ X_{1}(k) + X_{1}(n-k) \right] \\ &- \sin \Theta (X_{1}(k) + X_{1}(N-k)) + \cos \Theta (X_{R}(k) - X_{R}(N-k)) \right] \end{aligned}$$

#### for $\Theta = \pi k/N$

In this case the additional processing is traded off against the time to perform an N-point FFT instead of an N/2-point FFT. The additional processing is more than the previous method but requires less memory since only one data set is processed instead of two data sets simultaneously. The data flow through the system is also more regular, and that is sometimes important.

A third algorithm developed by Beraland eliminates the calculations which produce redundant results. This method does not use the complex FFT algorithm; it treats the data as real or imaginary numbers during the calculations instead of operating on complex pairs. It also contains two butterfly configurations instead of the single butterfly used in complex algorithms. These butterflies are diagrammed in Figure 3-2.9. Butterfly type one is used when k = 0 and butterfly type two is used for all other coefficients. This algorithm also produces outputs in scrambled order as shown in Table 3-1 with the real and imaginary parts in consecutive positions in the The first two elements are the sequence. exception. These are the real parts of the first and



Figure 3-2.7



last frequency filters, which have imaginary parts equal to zero. Although the order appears similar to bit-reversed order, it does not have a simple translation and must be generated algorithmically from the previous sequence. The new sequence algorithm is as follows:

- 1) Even elements of the new sequence (starting with element 0) are the previous sequence.
- Odd elements are inserted by subtracting the even element from the length of the new sequence except for element 1 which is equal to 1/2 the length of the sequence.

#### 3.3 MAGNITUDE CALCULATION

An operation that is frequently necessary in signal processing is the computation of the magnitude of complex samples. This is particularly common when performing FFT processing. A magnitude can be calculated directly as the square root of the sum of the squares of the quadrature components. However this algorithm has the disadvantage of requiring double precision arithmetic for the calculations as well as being complicated and time consuming. A machine performing 16-bit operations would require 32-bit arithmetic to sum the products and the square root computation typically requires one iteration per bit in the result.

Since the computation of the square root of the sum of the squares is such a complex task, many alternatives to the magnitude calculation have been developed. The simplest approach to the calculation is to use a linear combination of the quadrature components. This takes the form M = ax+by for x = max(|R|,|I|) and y = min(|R|,|I|), i.e. x = abs(larger) and y = abs(smaller). Systems in which a shift is less expensive than a multiply (in time, hardware or both) typically use coefficients like a =1 and b =1/2. However a system which calculates a product as easily as a shift can obtain a more accurate magnitude with a = 0.960 and b = 0.398. Even more accuracy can be obtained by changing the coefficients as a function of the angle. Although the magnitude is independent of angle, the approximation error has an angle dependency equal to  $1-(a*\cos(\Theta)+b*\sin(\Theta))$  for  $0<\Theta<\pi/4$ , which is the region defined by the larger/smaller equation stated above. The angle dependency of the algorithm can be kept simple by defining only two regions separated at an angle where the tangent is a simple ratio such as where one quadrature component is twice the other. An example of this approximation is to use a = 1 and b = 1/4 when x>2y and a = 3/4 and b = 3/4 for x<2y.

Another approach to calculation of the magnitude is to use Cartesian-to-polar coordinate conversion

|     |                                  | Table                            | 3-1                              |                                                                        |
|-----|----------------------------------|----------------------------------|----------------------------------|------------------------------------------------------------------------|
| N = | 4                                | 8                                | 16                               | 32                                                                     |
|     | 0 <sub>R</sub>                   | 0 <sub>R</sub>                   | 0 <sub>R</sub>                   | 0 <sub>R</sub><br>16 <sub>R</sub>                                      |
|     |                                  |                                  | 8 <sub>R</sub>                   | 8 <sub>R</sub><br>8 <sub>I</sub>                                       |
|     |                                  | 4 <sub>R</sub>                   | 4 <sub>R</sub><br>4 <sub>I</sub> | 4 <sub>R</sub><br>4 <sub>I</sub><br>12 <sub>R</sub><br>12 <sub>I</sub> |
|     | 2 <sub>R</sub>                   | 2 <sub>R</sub><br>2 <sub>I</sub> | 2 <sub>R</sub><br>2 <sub>I</sub> | 2 <sub>R</sub><br>2 <sub>I</sub><br>14 <sub>R</sub><br>14 <sub>I</sub> |
|     |                                  |                                  | 6 <sub>R</sub><br>6 <sub>I</sub> | 6 <sub>R</sub><br>6 <sub>I</sub><br>10 <sub>R</sub><br>10 <sub>I</sub> |
|     | <sup>1</sup> R<br>1 <sub>I</sub> | 1 <sub>R</sub><br>1 <sub>I</sub> | 1 <sub>R</sub><br>1 <sub>I</sub> | 1 <sub>R</sub><br>1 <sub>I</sub><br>15 <sub>R</sub><br>15 <sub>I</sub> |
|     |                                  |                                  | 7 <sub>R</sub><br>7¦             | 7 <sub>R</sub><br>7 <sub>1</sub><br>9 <sub>R</sub><br>9 <sub>I</sub>   |
|     |                                  | 3 <sub>R</sub><br>3 <sub>I</sub> | 3 <sub>R</sub><br>3 <sub>I</sub> | 3 <sub>R</sub><br>3 <sub>I</sub><br>13 <sub>R</sub><br>13 <sub>I</sub> |
|     |                                  |                                  | 5 <sub>R</sub><br>5 <sub>I</sub> | 5 <sub>R</sub><br>5 <sub>I</sub><br>11 <sub>R</sub><br>11 <sub>I</sub> |

algorithms. The quadrature components are a vector in Cartesian coordinates and conversion to a magnitude-phase pair gives the desired result. A direct calculation of M = y/sin(arctan(y/x)) involves divisions and arctangents which are not attractive alternatives to a square root calculation. However, once the phase is determined, multiplication can be used to rotate the vector onto the x-axis where the value of the real component and magnitude are identical. This reduces the problem to calculation of the arctangent usually by means of a lookup table. Division can be eliminated by concatenating the two components of the vector to form the address of the table, but this double length address results in a prohibitively large table. The simplest approximation is to normalize the vector by left-justifying the larger component, and to use a limited number of MSBs of each component to address the table, eq. using bits x15-10 and y15-10 of 16-bit numbers to address a table containing 4096 angles. A more accurate approximation which also removes the necessity for normalization uses the most significant bits of the larger component, the most significant bits of the smaller component and the bit position of the MSBs of each component (or the number of leading zeros). This might use two data bits and a four bit position for each component of the vector to address a table of 4096, e.g., bits x<sub>12-11</sub> and position 12 for one component and bits y8-7 and position 8 for the other (assuming bits x15-12 are all the same and y<sub>15-8</sub> are also the same). Because the arctangent is well behaved for small angles, this algorithm can be iterated for greater accuracy. The previous one can not be iterated since bits y15-10 become nonsignificant as the vector is rotated toward the x-axis. Addressing a table with the number of leading zeros of each component and the single bit following the leading one can produce a magnitude with 1.4% accuracy. Additional accuracy can be obtained by using more bits following the leading one or by iteration. Table size can also be traded off by using the difference between the number of leading zeros of each component to address the table rather than the individual counts. Both of the approximations could produce the sine and cosine directly from the table but the angle is sufficient when a sine and cosine table is already available for FFTs.

Another approximation which rotates the vector onto the x-axis can be used if multiplication is too CORDIC (COordinate Rotation Digital costly. Computer) rotation is one of a class of algorithms developed by Volder in 1956. The key elements of this algorithm are successive approximation and elimination of multiplication. The vector is placed in the first quadrant by taking the absolute values of the components and is then rotated toward the xaxis by adding or subtracting a series of angles. Each angle is added if the phase of the vector is negative or subtracted if the phase is positive with the result that the x component of the vector becomes the magnitude as the phase angle approaches zero. Positive or negative phase is the same as a positive or negative v value which makes the determination of this condition straightforward. The angles are selected so that their tangents are simple binary ratios which allows rotation to take the form

$$x_n = x_{n-1} + y_{n-1}/2^r$$
 and  
 $y_n = y_{n-1} + x_{n-1}/2$ 

where r increments for each iteration. Accuracy of the algorithm is a function of the number of iterations with four iterations producing an accuracy of 0.772%. A side effect of the algorithm is that there is gain, i.e. after four iterations the x component will be 1.643 times the magnitude of the original vector. However the gain is fixed for a given number of iterations and is usually absorbed in the gain of the system.







A common use of the magnitude calculation is for the detection of signals in noise such as radartarget detection or communications carrier detection. The major concern in this application is the signal to noise ratio loss caused by the approximation. An average background value (representing noise) is calculated using an approximation with minimum average error and the signal magnitude is calculated for minimum peak error. Using the simple shifting approximations would mean using larger +1/4 smaller for the average (absolute error = +3, -11%, average error = 0.6%) and larger +1/2 smaller for the signal (maximum absolute error = +11, -0%, average error = 8%). Table 3-2 contains various coefficients and the errors associated with them.

Table 3-2

| Approximation          | Peak<br>Error | Average<br>Error |  |
|------------------------|---------------|------------------|--|
| L+.5S                  | 11.80         | 8.68             |  |
| L+.25S                 | 11.60         | 0.65             |  |
| .960L + .398S          | 3.96          | -1.30            |  |
| L + .25S / .75L + .75S | 6.06          | -3.01            |  |

#### CHAPTER 4 SYSTEM DESIGN

#### 4.1 ARRAY PROCESSOR DESIGN BASED ON THE Am29500 FAMILY

The board described in this chapter was built and tested in AMD Headquarters Applications in early 1985. This application was developed to show designers the efficiency in using the Am29500 Family for digital signal processing. Doing a paper design is one thing; building a working prototype boosts confidence level. This section of the manual will help the reader to design a system and point out what aspects of the design require consideration.

The goal here is to design an optimum cost/performance board. Table 4-1 compares radix-2 butterfly cycle times for different architectures. The architecture chosen for this design is the 2.2.1 architecture: two buses, two ALUs, and one multiplier. With this architecture, each butterfly takes four cycles. Therefore, 5,120 butterflies in a 1K complex point FFT at 10 MHz takes 2 ms. Figure 4-1.1 shows the basic architecture of the board. The two ALUs are 16 bits each for real and imaginary data. Each ALU is comprised of two Am29501s. The two ALUs can also be combined by using the Am2902 Carry Lookahead Generator to form a single, 32-bit, double-precision ALU, The two 16-bit buses supply real and imaginary data from the data memories to the ALUs, and vice versa, via the bidirectional DIO ports of the Am29501s. Data from the ALUs is transferred to the multiplier via the bidirectional MIO ports of the Am29501s. If 32-bit products are required, 16 of the 32 bits could come in from the multiplier into the MIO ports. The 16X16 multiplier is the Am29517. Its X and Y ports are the two 16-bit input ports, and its P port is the 16-bit output port on which the 32-bit product can be multiplexed. The Y port of the multiplier is a bidirectional port and, for single cycle 32-bit multiplies, this port has to be used in conjunction with the P port to get the complete 32-bit product.

The MIO ports of either the real or imaginary ALUs can be directed to the Y port of the multiplier. The X port data can be selected from one of several sources—the MIO ports of either the real or imaginary ALUs, the real data bus, the imaginary data bus, or the coefficient PROMs. The 1-of-4 MUX provides the path required for data flow to the X port. The P port of the multiplier goes to the MI ports of the ALUs.

The data memory is made up of high-speed (45 ns) RAMs. There are two memory banks. While one is being worked on by the DSP algorithm, the host CPU can unload and reload the other. Each memory bank is 1K deep and 32 bits wide, which means that the maximum FFT size can be 1K complex. Data from the host can be DMA'd over to the board or the CPU can use I/O to transport it. DMA is only provided on the 16-bit host buses. Eight-bit host buses must use programmed I/O. Thirty-two I/O addresses are reserved for this board and the address decode logic decodes these addresses and selects appropriate logic on the board.

The Am29540 and the Am29116 generate addresses for DSP algorithms. The Am29540 is an FFT address generator. The transform length, FFT type, etc. are supplied by the CPU to the part, and

|                       |      | (Ra             | dix-2 FF | Т)             |               |       |      |         |
|-----------------------|------|-----------------|----------|----------------|---------------|-------|------|---------|
| For Each<br>Butterfly | Memo | ory Access<br>8 | Add      | /Subtract<br>6 | Multiply<br>4 |       |      |         |
| Resources             | Memo | ory Buses       | ALUs     |                | ALUs          |       | Mult | ipliers |
| # of<br>Cycles        | #    | Usage           | #        | Usage          | #             | Usage |      |         |
| 8                     | 1    | 8/8             | 1        | 6/8            | 1             | 4/8   |      |         |
| 6                     | 2    | 4/6             | 1        | 6/6            | 1             | 4/6   |      |         |
| 4                     | 2    | 4/4             | 2        | 3/4            | 1             | 4/4   |      |         |
| 3                     | 4    | 2/3             | 2        | 3/3            | 2             | 2/3   |      |         |
| 2                     | 4    | 2/2             | 4        | 1.5/2          | 2             | 2/2   |      |         |

#### Table 4-1 "Optimum" Cost/Performance (Radix-2 FFT)

the part puts out the correct sequence of data and coefficient addresses. The Am29116 provides addresses for the filter and matrix algorithms.

The Am29520 is the address pipeline register for the data memory. The Am29821 is the address pipeline register for the coefficient PROMs for the FFT and filter algorithms.

The last block in the architecture is the microprogram control unit. The microword width is 128 bits. The code can be up to 2K deep. High-speed (35 ns) registered PROMs are used to store the code. The sequencer is the Am2910A. Two Am2922s allow the sequencer to test up to 16 different conditions.

The detailed architecture of the board is explained in the following five sections.

- a) Arithmetic
- b) Memory
- c) Addressing
- d) Control
- e) I/O

#### 4.1.1 Arithmetic

Figure 4-1.2 shows a detailed diagram of the arithmetic section.

Data scaling must be considered when designing an FFT board. This is necessary in fixed point systems, to ensure that the results do not overflow. The approach taken here is "block scaling," wherein all data is scaled by a certain amount at each pass. This kind of scaling can be done at the input to or the output from the ALU. A shifter at the input to each ALU serves the purpose without restricting the input data. When the shifter is at the output, the input data is restricted so that there is no overflow on the first pass through the pipeline. Overflow can occur during complex mutiplication when summing R\*R-I\*I or R\*I+I\*R because of inaccuracies due to truncation. Overflow could also occur when adding or subtracting, so two bits of overflow should be allowed when performing an FFT butterfly. Scaling involves right shifting of the data. Being able to shift up to 3 places is sufficient and thus 4-bit shifters (Am25S10), not barrel shifters, Figure 4-1.3 shows how four are used. Am25S10s are connected to form a 16-bit shifter for each ALU. They are connected so that when shifting a 2's complement number, the sign bit gets copied into the shifted position. The real and imaginary buses bring data into the DIO ports of the ALUs via these shifters. Two microcode bits

control the amount of shift. The return paths from the DIO ports of the ALUs to the memories are buffered by the Am29827 10-bit buffers. The Output Enables of the shifters and buffers are controlled by the Write Enable lines from the microword to the data memories to ensure that bus contention does not occur.

The FFT algorithm uses the formula A±WB where A and B are the complex data points and W is the complex coefficient. Multiplication is only between B and W. So for the FFT algorithm, it is sufficient to have the coefficient as one input to the multiplier and the ALUs as the other input. Squaring to form magnitude, on the other hand, requires а multiplication between "data" inputs. No coefficients are involved in matrix multiplication. Thus, for this algorithm, both multiplier inputs must connect to the ALUs. Sum-of-product-type calculations may require that data flow directly from the memory into the multiplier, bypassing the ALU. Finally, complex arithmetic requires that data flow from either ALU to the other. All of these data paths exist in this design. The output of a 16-bit, 1of-4 MUX, made up of eight 74LS253s, is connected to the X input of the multiplier. The four possible X inputs are: the coefficient PROMs, the real data bus from memory, the imaginary data bus from memory, and either ALU. A transceiver between the two ALUs, appropriately controlled, prevents bus contention and allows either ALU to be connected to either multiplier input. The transceiver also allows bidirectional data flow from one ALU MIO port to the other. The MUX and transceiver are controlled from microcode. The Y input of the multiplier is a bidirectional port. If a 32bit product is required. 16 of the product bits can be multiplexed on this port and input into the ALUs via their corresponding MIO ports. The other 16 bits of the 32-bit product can go from the P port of the multiplier to the MI ports of the ALUs.

Some algorithms may require double-precision arithmetic. The Am2902 is a Carry Lookahead Generator that allows cascading of all four Am29501s to form a 32-bit ALU. Figure 4-1.4 shows how the ALUs and lookahead generator are interconnected to allow this. Here the Most Significant Byte for double-precision is the Imaginary MSB and the Least Significant Byte is the Real LSB. The RALUOM bit from microword controls whether the configuration is single-precision or double-precision. When this bit is High, the propagate and generate signals from the imaginary MSB are forced High, thus forcing  $C_{N+Y}$  from the Am2902 Low and effectively enabling single-precision.







Figure 4-1.3



#### 4.1.2 Memory

The data memory is an important consideration in the design. It must be designed to support the high-speed architecture. The minimum requirement is one memory bank, to be toggled between the host system and the DSP Processor. But this means that the DSP Processor is idle during the time data is unloaded and new data reloaded. Real-time applications would typically sample data continuously at a fixed rate. Results of the process would be read continuously from the data memory. Thus, it is necessary to have two memory banks so that, while one bank is being unloaded and reloaded, the other bank is processed by the DSP. This restricts this design to in-place FFTs.

Figure 4-1.5 shows the memory section architecture of this design. High-speed (45 ns) Am2148 memories make up the two memory banks designated L (Left) and R (Right). Each memory bank is 1K deep and 32-bits wide. Since FFTs operate on complex data, each memory bank is further divided into 16-bit real and imaginary parts designated by R and I respectively. One line, Q (described in the Control section), controls memory bank switching. To allow parallel operations, two address buses (designated Left and Right) supply addresses simultaneously to the two memory banks. These address buses switch when Q switches so that DSP data addresses are provided to the memory on which the DSP is processing and host data addresses are provided to the other memory bank for unloading and reloading of data.

The DSP data buses are 16 bits wide. Transceivers (Am2947s) isolate one memory bank from the other on the DSP side. Q controls the Chip Enables of these transceivers so that only the appropriate set of transceivers are enabled at any time. The direction of these transceivers is controlled by the real and imaginary Write Enable bits from microcode.

Host data buses can be 8 bits or 16 bits wide. The three inverting transceivers (Am2946) at the top of Figure 4-1.5 are connected to accommodate both sizes. If the host bus is 16 bits wide, transceivers 1 and 3 are enabled by CD3 so that data is transferred directly in 16-bit words. If the host bus

is 8 bits wide, data must be transferred in bytes. First the low byte is transferred through transceiver 3 and then the high byte is transferred via transceiver 2. Notice that when the low byte is written, transceiver 1 is also enabled and invalid data enabled on the high byte. This is followed immediately by the true high byte. When data is read back by the host, there is no problem because the host bus is only 8 bits wide. Control signals CD2 and CD3 are generated from a PAL device.

The direction of these transceivers is controlled by the system Read signal, IOR, such that the transceivers are directed into the board by default.

The Am2947 transceivers on the host side of the section are to isolate the memory banks on that side. Their Chip Enables (CELR, CELI, CERR, CERI) are generated in a PAL device. Control signal Q is used to distinguish between the left and right memory banks. The system Read signal, IOR, also controls the direction of these transceivers.

Eight Write Enable lines, WE1 through WE8, are produced in a PAL device. Data Writes from the DSP side are in 16-bit words only. Two microcode bits, one for real data and the other for imaginary data, are used to generate the Write signals from the DSP side. Data Writes from the host side can be accomplished by DMA or I/O. DMA is allowed only for 16-bit transfers. The Am9517A is the DMA controller being used in the design. Since I/O Writes can be in either byte or word mode, two I/O addresses are reserved for this. When a decode of these two addresses occurs, a PAL device produces two signals, BYTEH for the high byte and BYTEL for the low byte. When the bus is used in 16-bit mode. the MULTIBUS\* produces a control signal, BHEN. A combination of these three signals is used to produce the eight Write Enables during data loading via I/O.

#### 4.1.3 Addressing

To achieve parallel operation, both memory banks are addressed simultaneously, one by the host for unloading and reloading of data and the other by the DSP address generator. Figure 4-1.6 shows the architecture of this section.

Addressing from the host processor must accommodate DMA or programmed I/O. For this design, only 16-bit DMA transfers are allowed. Programmed I/O transfers, on the other hand, can be 8 bits or 16 bits. Host addressing is done by a "fly-by" counter. The counter is preloaded with the starting address from the host processor. The counter is clocked by the Read or Write signal produced by the Am9517 DMA Controller, if DMA is being used, or by the Read or Write signal from the host system, if programmed I/O is being used. A PAL device produces these clock signals for the fly-by counter. The 12-bit Counter is made up of three Am74161s (4-bit presettable counters).

The Least Significant Bit of the fly-by counter goes into the PAL device that produces the eight Write Enable signals for the memory. It distinguishes the real data from the imaginary data. The next 10 bits of the counter address the 1K deep memory. The address for a complex word of memory is the same. Thus in the 16-bit mode, the address from the flyby counter to the memory remains unchanged for two consecutive clocks and the Least Significant Bit of the fly-by counter helps to generate a Write signal for either the real or imaginary part of the complex word. For 8-bit I/O transfers, the fly-by counter is clocked once every two 8-bit Writes so that the Least Significant Bit of the fly-by counter still distinguishes between real and imaginary data.

This design supports the following DSP processes: 1)Fast Fourier Transforms, 2) Filters, and 3) Matrix Multiplication. Addressing for the FFT is quite complex but the Am29540 provides a hardware solution. Addresses for data source, data destination and coefficients are generated by the Am29540 FFT Address Sequencer. The microcode indicates to the address sequencer the FFT type (radix 4/2; inplace, non-inplace; DIT/DIF). Four bits from the Instruction Register (described in the Control Section) indicate the transform length to the sequencer. The transform length is latched into the part at the start of the process. That's all that is required for initialization. The sequencer now produces data and coefficient address in the required order for the entire transform.

The Am29116 is programmed to produce the address sequence for Filters and Matrix Multiplication. Since the board runs just one process at a time, the Am29540 and the Am29116 are never used simultaneously. Therefore the microcode bits for the two parts are overlayed. The FFT transform length four bits from the IR indicate to the Am29116 the type of filter or the matrix size.

The data addresses must be saved in a pipeline register for efficient microcoding. The Am29520 serves as a dual, two-level pipeline register. The source addresses for the two complex inputs are saved in one level. These are moved into the second level to become the destination addresses for the results and new source addresses get put into the first level. Four bits of microcode control the two Am29520s that are connected in parallel to form the 12-bit-wide pipeline register for the data addresses.





ယ္ထ

The coefficient addresses need a simple, onelevel pipeline register and the Am29821 10-bit register serves the purpose. For the FFT process, there are 16-bit sine and cosine coefficient (Am29526-Am29529). Two PROMs 8-bit PROMs. Am27S291s. form a third 16-bit coefficient PROM for the filter algorithm. Addresses for the sine and cosine filter coefficient PROMs are generated by the Am29540. Coefficient addresses from the Am29540 are leftjustified and for radix-2 operations, the MSB is alwavs a "0." The Am29116 is programmed so that its MSB is always a "1" when generating filter coefficient addresses. This MSB is therefore used to "Chip Select" between the FFT coefficient PROMs and the filter coefficient PROMs. A microcode bit selects between the sine and cosine during the FFT process.

The two memory banks are addressed simul-

taneously, one by the DSP addressing and one by the host addressing. Two address buses therefore supply these two addresses to the two memory banks. At the end of a process, the two buses must be switched, under control of the flag Q. This is achieved by the four 10-bit buffers, Am29827s. At any one time, two of these buffers are enabled by Q, one supplying host addresses to one bank of memory and the other supplying DSP addresses to the other bank. At the end of a process, these two buffers are turned off and the other two turned on so that the buses switch, effectively switching memory banks.

## 4.1.4 Control

This section addresses the "heart" of the design because the microcode controls the rest of the system. The microcode width must be decided during this phase of the design. A Microprogram



Sequencer is used to sequence through the microcode. A Pipeline Register is required for the microcode bits so that the sequencer can fetch the next microinstruction while the present one is executing. A Condition Code MUX is needed to test conditions. These are the basic necessities of the Control section.

Figure 4-1.7 shows the architecture for this section. The microcode width required for this design is 128 bits: 58 bits for the Real and Imaginary ALUs, 21 overlayed bits for the Addressing section, 6 bits for the Multiplier, 4 bits for the Shift Register, 4 bits for the Address Pipeline Register, 16 bits for the Microprogram Sequencer, 5 bits for the Condition Code MUX, and some other miscellaneous bits. The PROMs used are Am27S45As; they are high speed (35 ns) 2K x 8 registered PROMs. Using registered PROMs decreases part count and saves board space because the Pipeline Register is built into the PROM. The Condition Code MUX has built-in registers and so, to avoid having two registers in its path, a non-registered microcode PROM is used for it. A registered PROM could be used but the microcode for the CCMUX would be shifted by 1 line with respect to the rest of the microcode.

These PROMs have a 2049th location which can be programmed as any value. This value appears at the output of the PROM when an INIT signal is applied to the PROMs. This is a useful feature for initialization on reset. The op-code for a Jump to Zero (JZ) instruction for the sequencer is programmed into this location for initialization on reset. The reset line from the CPU is connected to the INIT input of the PROM.

The sequencer used is the Am2910A, capable of addressing up to 4K of microcode memory. We have 2K deep microcode memory in this design which is enough for the processes desired here.

Two Condition Code MUXs (Am2922s) enable testing of up to 16 inputs, of which one is used for the "forced pass" condition. The outputs of the Condition Code MUX is fed into the CC input of the sequencer for condition testing.

The control section also contains an 8-bit Instruction Register (Am29845). Four bits go to the Am29540 or Am29116 to indicate FFT transform length, filter type, or matrix size. Three bits go into the sequencer to indicate the process to be run. The eighth bit from the host goes into a latch. This is the latch for Control Signal Q that indicates to the entire system which memory bank to process. The state of this eighth bit from the host either sets or resets the Q latch. Also included in the Control Section of the design is the clock circuit. The Am2925 is a clock generator and microcycle length controller. It produces clocks of varying duty cycles.

The following is a sequence of events that would occur from start to finish of any process. On powerup, the reset line from the CPU is activated and the op-code for a JZ instruction is put out from the microcode PROM to the sequencer. This makes the sequencer jump to the start of microcode. The sequencer now waits for an instruction to be loaded into the IR by the CPU.

On receiving an instruction, the sequencer jumps to one of eight locations at the end of the microcode. The exact location is decided by the three bits from the IR to the branch address field of The value of the three bits the sequencer. depends on what process must be run (one of eight). The sequencer jumps to one of eight locations and gets a branch address from there which would be the starting address for the At the end of the process, the process. sequencer would execute a JZ instruction and set a flag. This flag can be read by the CPU via software. The CPU can now load the IR with another instruction. The CPU should not load the IR if the flag from the microcode is inactive, as the sequencer would miss the instruction.

## 4.1.5 Input/Output

The I/O section is the interface between the board and the host system. Decoding logic is required to decode I/O addresses reserved for the board. Also required is the DMA controller and some registers and buffers to interface with the host address and data buses. Figure 4-1.8 is a diagram of the architecture.

The DMA Controller used in this design is the Am9517. The Am9517 is chosen primarily because of its capability to interface easily with an 8-bit CPU. If a 16-bit CPU were being used, the Am9516 would have been selected. Although an 8-bit CPU and a 16-bit DMA seem contradictory, the Multibus allows bus masters with different bus widths to exist in the same system.

I/O addresses must be reserved for this board. The DMA Controller must have 16 addresses reserved for it. Other I/O addresses needed are for I/O memory writes, fly-by counter loading, checking process complete status flag, loading three MSBs of address for DMA (described later in this section), loading the instructions into the IR, and initialization of the board. Thirty-two I/O addresses are reserved for this board.

The Am29809, an 8-Bit Comparator, produces a board select when one of these 32 I/O addresses is put on the bus by the CPU. When using DMA, the controller must be programmed. This is done via Buffer 1 (Am29828). When an address decode occurs for the DMA Controller. Buffer 1 is "output enabled" and the Am9517 is "chip selected" and programmed. The MULTIBUS address bus is 20 bits wide. Since DMA is allowed only in 16-bit mode and this design is for an 8-bit CPU, the LSB of the address bus is grounded. The Am9517 operates in 256-byte pages. The low order 8 bits go from the Am9517 to the host address bus via Buffer 2. The output of this buffer is enabled by the acknowledge line from the controller. The high order 8 bits of the address must be latched into a register. Device #3 on the diagram is an 8-bit register into which these 8 bits are latched. The MULTIBUS address bus being 20 bits wide, the remaining 3 bits must be loaded into a second register (Device #4 on the diagram) by the CPU. An I/O address (signal name ADDRL) is reserved for this function.

An I/O address is also reserved for the process complete status flag. This flag from microcode goes into Buffer 5 and this buffer's output is enabled when the address decode for this flag occurs. The CPU reads the value of the buffer and decides if the DSP process is complete.

Appendix 1 lists the equations for the three PAL devices used in the I/O section.

## 4.1.6 Timing Considerations

This is probably the most critical phase of the desian. "Timing Considerations" refers to the speed at which the design will actually run. This should be done before the board is built. The designer starts out with a certain set of goals which the design has to meet. These goals can be divided into two categories. Category 1 lists the different algorithms that the design has to process. Category 2 lists how fast these processes must be run. The designer starts with the goals in Category 1. He designs his architecture so that all algorithms can be run by the design. He then writes microcode for the processes. Knowing the architecture and having written the microcode, the designer can now evaluate worst case data paths and can compute process times and compare them with the goals in Category 2. All is well if the goals in both categories are met. If not, trade-offs must be made. First, the designer should try to make the microcode more efficient. If this is not sufficiently effective, the architecture needs to be changed or some of the goals need to be relaxed. Changing the architecture usually implies adding more hardware so that the design has more processing power. This is not always possible due to board space and cost limitations. The alternative solution, relaxing some of the goals, could mean either disallowing some of the algorithms thus getting rid of some hardware and thereby eliminating propagation delays and increasing speed, or finally, deciding that the slower speeds are acceptable.

In this design, the Category 1 goals were:

- a) 1K Complex FFT;
- b) Filters;
- c) Matrix Multiplication.

The Category 2 goal was: a) 1K Complex FFT in 2 ms.

The hardware is designed and microcode for the FFT has been written. Each butterfly takes 4 cycles. Now the worst case data path needs to be computed which would decide the minimum cycle time. To achieve the 2 ms goal, the cycle time should be no greater than 100 ns.

All data paths should be and have been considered. Three of the worst paths are:

- 1) Clock to output of microprogram pipeline register
  - + select to output of Am29520
  - + prop delay of bus switching transceivers
  - + data memory access time
  - + prop delay of transceivers separating
  - memory banks on DSP side
  - + shifters
  - + data set up time of Am29501
  - = 141 ns.
- 2) Clock to output of microprogram pipeline reg + INST to output of Am29116
  - + data setup time of Am29520
  - = 104 ns.
- Clock to output of microprogram pipeline reg + 2 slice delay of Am29501
  - + CCMUX prop delay
  - + COMOX prop delay
  - + Am2910 setup and prop delay
  - + microprogram memory access time
  - = 178 ns.

The worst case path is 178 ns which means that the maximum clock frequency can be 5.62 MHz; less than the goal of 10 MHz. Reducing microcode is not possible. With the architecture chosen, doing a butterfly in 4 cycles is the best one can do; it's time to make a trade-off. As this design is for an evaluation board, the goals of Category 1 cannot be relaxed. The two possible paths left are to add more computing power to the design or relax the goals of Category 2. Again, because this design is



for an evaluation board, the latter of the two paths is chosen. The worst case path is not likely to be executed very often, so it is supported by changing the clock period with the Am2925, and the 141 ns path is considered worst case making the clock frequency 7.09 MHz. This path can be improved by putting in faster transceivers and faster data memories like the Am2148-35 or the Am9150-25. Using the latter brings the speed up to 8.26 MHz. Adding more computing power would mean providing more ALU's and/or multipliers. The designer would have to evaluate by how much the microcode would reduce if there was more arithmetic processing power. Timing paths would have to be recalculated and a new estimate made for the maximum clock frequency.

## 4.1.7 Microcode

This section deals with the software aspect of the design. Microprogramming involves writing a coherent sequence of microinstructions used to execute the various steps required by the process. A microinstruction usually has two primary parts: 1) the definition and control of all elemental micro-operations to be carried out; and 2) the definition and control of the address of the next microinstruction to be executed.

For our design, the definition of the various microoperations to be carried out includes the Real and Imaginary ALU's, multiplier, data address generation, data memory control, address pipeline registers, shifters, clock controller and condition code MUX select. The definition of the next microinstruction function includes identifying the source selection of the next microinstruction address and supplying the starting address for any process.

Two basic principles should be remembered when writing microcode: 1) parallel execution of different operations, and 2) maximum utilization of resources due to the pipelined architecture.

The microcode for the FFT is described here in some detail.

The FFT algorithm is highly repetitive. The same butterfly operations are performed on different sets of numbers. Each radix 2 butterfly consists of four multiplies and six adds/subs. The code developed here is for a radix 2 DIT FFT. The equations for a radix 2 DIT butterfly are:

A1 = A + BWk B1 = A - BWk

where A and B are the complex input points, A1 and B1 are the complex results and Wk is the complex coefficient. The following rules should be kept in mind when developing microcode:

- a) Determine the program repetition rate. We have determined that our repetition rate is R = 4, i.e., each butterfly will take 4 cycles.
- b) Start programming at line R +1=5.
- c) For every program entry, enter an 'X' R cycles above and below (Table 4-3).

The arithmetic section will be programmed first. Table 4-2 is a programming work sheet for this section. Figures 4-1.9 through 4-1.18 and Tables 4-3 through 4-7 show, the development of the code for the arithmetic section. Each block diagram showing data movement for a particular line of code is followed by a coding sheet showing the corresponding line. If a conflict of resources occurs, another resource must be used or the function re-scheduled for execution.

Note in Table 4-9 that code is repeated every 4th line. Also note that the ALU's and multiplier are utilized 100% of the time. Also note in Table 4-9 that a new butterfly starts every 4th cycle. It is the pipelined process that makes the butterfly time equal to 4 cycles. At any given time, computation for 3 butterflies is in progress as illustrated in Figure 4-1.19.

Next the code for the address generating section needs to be written. Obviously it needs to be mapped into the code already written for the arithmetic section. Tables 4-10 and 4-11. show the coding for the coefficient PROM select and for the address pipeline register for the coefficient PROMs. Tables 4-12 through 4-14 show the coding for the FFT address generator and the pipeline register. Figures 4-1.20 through 4-1.30 show the data flow from the address generator through the data and coefficient address pipeline register to the memories.

Having decided upon the code, it is now necessary to actually write it. A symbolic language would no doubt be of great help. This is possible with the AMDASM meta-assembler which is used in two phases. The first phase consists of defining the microinstructions and the language. This is done by creating a file of the type XXXX.DEF (see Appendix 2, DSP.DEF) which contains:

- a) The microword width (WORD 128).
- b) The list of language mnemonics. Each mnemonic would be associated with an instruction field defining the bits controlling each resource.

Table 4-2

| Multiplier | MULT           |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|------------|----------------|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|
| Mu         | OIM            |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | B <sub>3</sub> |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | В              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | œ              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
| nary       | Å              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
| lmaginary  | A <sub>2</sub> |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
| -          | Ą              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | ALU            |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | å              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | B2             |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | æ              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
| a          | Å              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
| Real       | 42             |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | ∢              |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | ALU            |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | DIO            |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |
|            | Step           | - | 2 | 9 | 4 | 2 | 9 | 7 | 8 | 6 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |



| Table | 4-3 |
|-------|-----|
|-------|-----|

| ~ | -1   |        |     |    | Re             | al             |                |                |                |     |                | Imag           | inary          |                |                |                | Mul | tiplier |
|---|------|--------|-----|----|----------------|----------------|----------------|----------------|----------------|-----|----------------|----------------|----------------|----------------|----------------|----------------|-----|---------|
|   | Step | DIO    | ALU | A, | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | В <sub>3</sub> | ALU | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | мю  | MULT    |
|   | 1    | x      |     |    |                |                | х              |                |                |     |                |                |                | x              |                |                |     |         |
|   | 2    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 3    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 4    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 5    | Read B |     |    |                |                | DI             |                |                |     |                |                |                | DI             |                |                |     |         |
|   | 6    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 7    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 8    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 9    | x      | 4   |    |                |                | x              |                |                |     |                |                |                | x              | ,              |                |     |         |
|   | 10   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 11   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 12   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 13   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 14   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 15   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 16   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 17   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |
|   | 18   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |     |         |



| Tab | le | 4-4 |
|-----|----|-----|
|-----|----|-----|

|    |      |        |     |    | Re             | eal            |                |                |                |     |                | Imag           | inary          |                |                |                | Mul            | tiplier |
|----|------|--------|-----|----|----------------|----------------|----------------|----------------|----------------|-----|----------------|----------------|----------------|----------------|----------------|----------------|----------------|---------|
|    | Step | DIO    | ALU | A1 | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | ALU | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | мю             | MULT    |
|    | 1    | x      |     |    |                |                | x              |                |                |     |                |                |                | x              |                |                |                |         |
|    | 2    | x      |     | x  |                |                |                |                |                |     | x              |                |                | x              |                |                | x              |         |
|    | 3    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 4    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
| ſ  | 5    | Read B |     |    |                |                | DI             |                |                |     |                |                |                | DI             |                |                |                |         |
| -> | 6    | Read A |     | DI |                |                |                |                |                |     | DI             |                |                | н              |                |                | B <sub>R</sub> |         |
| ſ  | 7    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
| ſ  | 8    |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
| ſ  | 9    | x      |     |    |                |                | x              |                |                |     |                |                |                | x              |                |                |                |         |
| ĺ  | 10   | x      |     | x  |                |                |                |                |                |     | x              |                |                | x              |                |                | x              |         |
|    | 11   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 12   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 13   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 14   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 15   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 16   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 17   | 1      |     |    | 1              |                |                |                |                |     |                |                |                |                |                |                |                |         |
|    | 18   |        |     |    |                |                |                |                |                |     |                |                |                |                |                |                |                |         |

...





\$





|      |        |         |    | Re  | al             |    |                |                |        |                | Imag           | inary          |                |                |                | Mu             | ltiplier                      |
|------|--------|---------|----|-----|----------------|----|----------------|----------------|--------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|-------------------------------|
| Step | DIO    | ALU     | A1 | A2  | A <sub>3</sub> | Bı | B <sub>2</sub> | В <sub>3</sub> | ALU    | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | мю             | MULI                          |
| 1    | x      |         |    |     |                | x  |                |                |        |                |                |                | x              | 1              |                |                |                               |
| 2    | x      |         | x  |     |                |    |                |                |        | x              |                |                | x              |                |                | x              |                               |
| 3    |        |         | x  |     |                |    |                |                | -      | x              |                |                | x              |                |                |                | x                             |
| 4    |        | x       | x  |     | x              |    | x              |                |        | x              |                |                |                |                |                | x              | x                             |
| 5    | Read B |         | x  |     | x              | DI | x              |                | x      | x              |                | x              | DI             | x              |                |                | x                             |
| 6    | Read A | x       | DI | x   |                |    | x              |                | x      | X              |                | . <b>X</b>     | н              | x              | x              | B <sub>R</sub> | x                             |
| 7    |        |         | н  |     |                |    |                |                |        | н              |                |                | н              |                |                |                | B <sub>R</sub> W <sub>R</sub> |
| 8    |        | A1-MSP  | н  |     | MSP            |    | ALU            |                |        | н              |                |                |                |                |                | BI             | B <sub>R</sub> W <sub>I</sub> |
| 9    | x      |         | н  |     | н              | x  | н              |                | A1-MSP | н              |                | MSP            | x              | ALU            |                |                | в <sub>і</sub> w <sub>R</sub> |
| 10   | x      | A1 + A3 | x  | ALU |                |    | н              |                | B2-MSP | HX             |                | н              | x              | ALU            | MSP            | x              | в <sub>і</sub> w <sub>і</sub> |
| 11   |        |         | x  |     |                |    |                |                |        | x              |                |                | x              |                |                |                | x                             |
| 12   |        | x       | x  |     | x              |    | x              |                |        | x              |                |                |                |                |                | x              | x                             |
| 13   |        |         | x  |     | x              |    | x              |                | x      | x              |                | x              |                | x              |                |                | x                             |
| 14   |        | x       |    | x   |                |    | x              |                | ´ X    | x              |                | x              |                | x              | x              |                | x                             |
| 15   |        |         |    |     |                |    |                |                |        |                |                |                |                |                |                |                |                               |
| 16   |        |         |    |     |                |    |                |                |        |                |                |                |                |                |                |                |                               |
| 17   |        |         |    |     |                |    |                |                |        |                |                |                |                |                |                |                |                               |
| 18   |        |         |    |     |                |    |                |                |        |                |                |                |                |                |                |                |                               |

Table 4-5 Line 10: a conflict has developed in Im A1

|      |        |         |                | Re               | al             |                |                |    |        |                | Imag           | inary          |                |                |     | Mul            | tiplier                                  |
|------|--------|---------|----------------|------------------|----------------|----------------|----------------|----|--------|----------------|----------------|----------------|----------------|----------------|-----|----------------|------------------------------------------|
| Step | DIO    | ALU     | A <sub>1</sub> | A <sub>2</sub> - | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | В3 | ALU    | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | В3  | MIO            | MULT                                     |
| 1.   | x      |         |                |                  |                | x              |                |    |        |                |                |                | x              |                |     |                |                                          |
| 2    |        |         |                |                  |                |                |                |    |        |                |                |                | x              |                |     | x              |                                          |
| 3    | x      |         | x              |                  |                |                |                |    |        | x              |                |                | x              |                |     |                | x                                        |
| 4    |        | x       | x              |                  | x              |                | x              |    |        | x              |                |                |                |                |     | x              | x                                        |
| 5    | Read B |         | x              |                  | x              | DI             | x              |    | x      | x              |                | x              | DI             | x              |     |                | x                                        |
| 6    |        | x       |                | x                |                |                | x              |    | x      | x              |                | x              | н              | x              | x   | B <sub>R</sub> | x                                        |
| - 7  | Read A |         | DI             |                  |                | x              |                |    |        | DI             |                |                | H              |                |     |                | ₿ <sub>R</sub> ₩ <sub>R</sub>            |
| 8    |        | A1-MSP  | н              |                  | MSP            |                | ALU            |    |        | н              |                |                | x              |                |     | BI             | <sup>₿</sup> <sub>R</sub> ₩ <sub>I</sub> |
| 9    | x      |         | н              |                  | H              |                | н              |    | A1-MSP | н              |                | MSP            | x              | ALU            |     |                | в <sub>і</sub> w <sub>г</sub>            |
| 10   |        | A1 + A3 |                | ALU              |                |                | н              |    | B2-MSP | н              |                | н              | x              | ALU            | MSP | x              | в <sub>і</sub> w <sub>і</sub>            |
| 11   | x      |         | x              |                  |                |                |                |    |        | x              |                |                |                |                |     |                | x                                        |
| 12   |        | x       | x              |                  | x              |                | x              |    |        | x              |                |                |                |                |     | x              | x                                        |
| 13   | ,      |         | x              |                  | x              |                | x              |    | x      | x              |                | x              |                | ×              |     |                | x                                        |
| 14   |        | x       |                | x                |                |                | x              |    | x      | x              |                | ×              |                | x              | x   |                | x                                        |
| 15   |        |         |                |                  |                |                |                |    |        |                |                |                |                |                |     |                |                                          |
| 16   |        |         |                |                  |                |                |                |    |        |                |                |                |                |                |     |                |                                          |
| 17   |        |         |                |                  |                |                |                |    |        |                |                |                |                |                |     |                |                                          |
| 18   |        |         |                |                  |                |                |                |    |        |                |                |                |                |                |     |                |                                          |

 Table 4-6
 Line 10: conflict in Im A1 resolved by delaying Read A one cycle to Line 7 (from 6).









| ſ |      |          |            |                | Re             | al             |     |                |    |         |                | Imagi          | inary          |    |                |     | Mul            | tiplier                       |
|---|------|----------|------------|----------------|----------------|----------------|-----|----------------|----|---------|----------------|----------------|----------------|----|----------------|-----|----------------|-------------------------------|
|   | Step | DIO      | ALU        | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | Bį  | B <sub>2</sub> | B3 | ALU     | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B  | B <sub>2</sub> | в3  | MIO            | MULT                          |
| ĺ | 1    | x        |            |                |                |                | x   |                |    |         |                |                |                | x  |                |     |                |                               |
|   | 2    |          |            |                |                |                |     |                |    |         |                |                |                | x  |                |     | х              |                               |
|   | 3    | x        |            | x              |                |                |     |                |    |         | x              |                |                | x  |                |     |                | x                             |
|   | 4    |          | x          | x              |                | x              |     | x              |    |         | x              |                |                |    |                |     | x              | x                             |
| > | 5    | Read B   |            | x              |                | x              | DI  | x              |    | x       | x              |                | x              | DI | x              |     |                | x                             |
|   | 6    |          | x          |                | x              |                |     | x              |    | x       | x              |                | x              | н  | x              | x   | B <sub>R</sub> | x                             |
|   | 7    | Read A   | x          | DI             | x              |                | x   | x              |    | x       | DI             | x              | x              | н  | x              |     |                | B <sub>R</sub> W <sub>R</sub> |
|   | 8    | x        | A1-MSP     | н              | x              | MSP            | x   | ALU            |    | x       | н              | x              |                |    |                |     | BI             | B <sub>R</sub> W <sub>I</sub> |
|   | 9    | x        | <b>X</b> . | н              | x              | н              | x   | н              |    | A1-MSP  | н              | x              | MSP            | x  | ALU            |     |                | в <sub>I</sub> W <sub>R</sub> |
|   | 10   | x        | A1 + A3    |                | ALU            |                |     | н              |    | B2-MSP  | ۰H             |                | н              | x  | ALU            | MSP | x              | в <sub>і</sub> W і            |
|   | 11   | x        | B2+MSP     | x              | н              |                | MSP | ALU            |    | A1 + B3 | x              | ALU            | н              | x  | н              |     |                | x                             |
|   | 12   | Write B2 | x          | x              | н              | x              | н   | x              |    | A2 + A3 | x              | ALU            |                |    |                |     | ×              | x                             |
|   | 13   |          | A2 - B1    | x              | ALU            | x              |     | x              |    | x       | x              | н              | x              |    | x              |     |                | x                             |
|   | 14   | Write A2 | x          |                | x              |                |     | x              |    | x       | x              |                | x              |    | x              | x   |                | x                             |
|   | 15   |          | x          |                | x              |                | x   | - <b>X</b>     |    | x       |                | x              | x              |    | x              |     |                |                               |
|   | 16   | x        |            |                | x              |                | x   |                |    | x       |                | x              |                |    |                |     |                |                               |
|   | 17   |          | x          |                | x              |                |     |                |    |         |                | x              |                |    |                |     |                |                               |
|   | 18   | x        |            |                |                |                |     |                |    |         |                |                |                |    |                |     |                |                               |

Table 4-7 One complete butterly: lines 5-14 (10 lines).

|      |          |         |                | Re             | al             |     |                |                |         |                | Imag           | inary          |    |                |     | Mul            | tiplier                       |
|------|----------|---------|----------------|----------------|----------------|-----|----------------|----------------|---------|----------------|----------------|----------------|----|----------------|-----|----------------|-------------------------------|
| Step | DIO      | ALU     | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B   | B <sub>2</sub> | B <sub>3</sub> | ALU     | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | Bı | B <sub>2</sub> | в3  | мю             | MULI                          |
| 1    | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |                | A1-MSP  | н              | н              | MSP            | DI | ALU            |     |                | B <sub>1</sub> W <sub>F</sub> |
| 2    | Write A2 | A1 + A3 |                | ALU            |                |     | н              |                | B2-MSP  | н              |                | н              | н  | ALU            | MSP | 8 <sub>8</sub> | в <sub>1</sub> W <sub>1</sub> |
| 3    | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |                | A1 + B3 | DI             | ALU            | н              | н  | н              |     |                | ₿ <sub>₽</sub> ₩              |
| 4    | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |                | A2 + A3 | н              | ALU            |                |    |                |     | B              | ₿ <sub>₽</sub> ₩              |
| 5    | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |                | A1-MSP  | н              | н              | MSP            | DI | ALU            |     |                | в <sub>и</sub> м              |
| 6    | Write A2 | A1 + A3 |                | ALU            |                |     | н              |                | B2-MSP  | н              |                | н              | н  | ALU            | MSP | B <sub>R</sub> | B   W                         |
| 7    | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |                | A1 + B3 | DI             | ALU            | н              | н  | н              |     |                | B W                           |
| 8    | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |                | A2 + A3 | н              | ALU            |                |    |                |     | B <sub>I</sub> | в <sub>к</sub> w              |
| 9    | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |                | A1-MSP  | н              | н              | MSP            | DI | ALU            |     |                | в                             |
| 10   | Write A2 | A1 + A3 |                | ALU            |                |     | н              |                | B2-MSP  | н              |                | н              | Н  | ALU            | MSP | B <sub>R</sub> | в <sub>і</sub> w              |
| 11   | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |                | A1 + B3 | DI             | ALU            | н              | Н  | н              |     |                | в "W                          |
| 12   | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |                | A2 + A3 | н              | ALU            |                |    |                |     | BI             | в <sub>я</sub> w              |
| 13   | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |                | A1-MSP  | н              | н              | MSP            | DI | ALU            |     |                | в <sub>1</sub> w <sub>г</sub> |
| 14   | Write A2 | A1 + A3 |                | ALU            |                |     | н              |                | B2-MSP  | н              |                | н              | н  | ALU            | MSP | BR             | в <sub>I</sub> w              |
| 15   | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |                | A1 + B3 | DI             | ALU            | н              | н  | н              |     |                | B <sub>R</sub> W              |
| 16   | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |                | A2 + A3 | н              | ALU            |                |    |                |     | BI             | B <sub>R</sub> W              |
| 17   | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |                | A1-MSP  | н              | н              | MSP            | DI | ALU            |     |                | в                             |
| 18   | Write A2 | A1 + A3 |                | ALU            |                |     | н              |                | B2-MSP  | н              | 1              | н              | н  | ALU            | MSP | BR             | в,w                           |

 Table 4-8
 The code repeats every 4 lines. Thus a new butterfly is completed every 4 cycles.

|      |          |         |                | Re             | al             |     |                |    |         |    | lmag           | inary          |    |                |                | Mul            | tiplier                                  |
|------|----------|---------|----------------|----------------|----------------|-----|----------------|----|---------|----|----------------|----------------|----|----------------|----------------|----------------|------------------------------------------|
| Step | DIO      | ALU     | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | B   | B <sub>2</sub> | B3 | ALU     | Ą  | A <sub>2</sub> | A <sub>3</sub> | B  | B <sub>2</sub> | В <sub>3</sub> | мю             | MULI                                     |
| 1    | Read B   | A2 - B1 | н              | ALU            | н              | DI  | H              |    | A1-MSP  | н  | н              | MSP            | DI | ALU            |                |                | BIWF                                     |
| 2    | Write A2 | A1 + A3 |                | ALU            |                |     | н              |    | B2-MSP  | н  |                | н              | н  | ALU            | MSP            | BR             | в <sub>I</sub> WI                        |
| 3    | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |    | A1 + B3 | DI | ALU            | н              | н  | н              |                |                | B <sub>R</sub> W <sub>F</sub>            |
| 4    | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |    | A2 + A3 | н  | ALU            |                |    |                |                | B              | ₿ <sub>R</sub> ₩                         |
| 5    | Read B   | A2 - B1 | н              | ALU            | Ŧ              | DI  | н              |    | A1-MSP  | н  | н              | MSP            | DI | ALU            |                |                | в <sub>і</sub> М <sub>R</sub>            |
| 6    | Write A2 | A1 + A3 |                | ALU            |                |     | н              |    | B2-MSP  | н  |                | н              | н  | ALU            | MSP            | B <sub>R</sub> | в <sub>I</sub> W <sub>I</sub>            |
| 7    | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |    | A1 + B3 | DI | ALU            | н              | н  | н              |                |                | <sup>₿</sup> <sub>R</sub> ₩ <sub>F</sub> |
| 8    | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |    | A2 + A3 | н  | ALU            |                |    |                |                | BI             | B <sub>R</sub> W                         |
| 9    | Read B   | A2 - B1 | н              | ALU            | н              | DI  | н              |    | A1-MSP  | H  | н              | MSP            | DI | ALU            |                |                | в,พ,                                     |
| 10   | Write A2 | A1 + A3 |                | ALU            |                |     | н              |    | B2-MSP  | н  |                | н              | н  | ALU            | MSP            | B <sub>R</sub> | в <sub>I</sub> w                         |
| 11   | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |    | A1 + B3 | DI | ALU            | н              | н  | н              |                |                | в <sub>R</sub> W                         |
| 12   | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |    | A2 + A3 | н  | ALU            |                |    |                |                | BI             | B <sub>R</sub> W                         |
| 13   | Read B   | A2 - B1 | H              | ALU            | н              | DI  | н              |    | A1-MSP  | H  | н              | MSP            | DI | ALU            |                |                | B <sub>1</sub> W <sub>F</sub>            |
| 14   | Write A2 | A1 + A3 |                | ALU            |                |     | н              |    | B2-MSP  | н  |                | н              | н  | ALU            | MSP            | BR             | B,W                                      |
| 15   | Read A   | B2+MSP  | DI             | н              |                | MSP | ALU            |    | A1 + B3 | DI | ALU            | н              | н  | н              |                |                | B <sub>R</sub> W                         |
| 16   | Write B2 | A1-MSP  | н              | н              | MSP            | н   | ALU            |    | A2 + A3 | н  | ALU            |                |    |                |                | BI             | B <sub>R</sub> W                         |
| 17   | Read B   | A2 • B1 | н              | ALU            | н              | DI  | н              |    | A1-MSP  | н  | н              | MSP            | DI | ALU            |                |                | в "พ                                     |
| 18   | Write A2 | A1 + A3 |                | ALU            |                | Ι   | н              |    | B2-MSP  | н  | Γ              | н              | н  | ALU            | MSP            | BR             | в w                                      |

 Table 4-9
 A new butterfly must also start every 4 cycles.



 Table 4-10
 Coefficient selection: work back from required products.

|      | FFT Ac | ldr Gen | D Addr | Pipeline |            |                    |             |          |        | Mu       | ltiplier                      |
|------|--------|---------|--------|----------|------------|--------------------|-------------|----------|--------|----------|-------------------------------|
| Step | Instr  | A SEL   | Instr  | SEL      | RAM<br>R/W | Register<br>Enable | PROM<br>SEL | DIO      |        |          | Mult                          |
| 1    |        |         |        |          |            |                    |             | x        |        | $\Box$   |                               |
| 2    |        |         |        |          |            |                    | x           |          |        | $\Gamma$ |                               |
| 3    |        |         |        |          |            |                    | X           | x        |        | ΙL       | x                             |
| 4    |        |         |        |          |            |                    | x           |          |        |          | x                             |
| 5    |        |         |        |          |            |                    | x           | Read B   |        | ] [      | x                             |
| 6    |        |         |        |          |            |                    | Re          |          |        | / [      | x                             |
| 7    |        |         |        |          |            |                    | Im          | Read A   |        | Ľ        | ₿ <sub>R</sub> ₩ <sub>R</sub> |
| 8    |        |         |        |          |            |                    | Re          | x        |        | L        | B <sub>R</sub> W <sub>I</sub> |
| 9    |        |         |        |          |            |                    | Im          | x        |        |          | в <sub>I</sub> W <sub>R</sub> |
| 10   |        |         |        |          |            |                    | x           | x        |        | L        | B <sub>I</sub> W <sub>I</sub> |
| 11   |        |         |        |          |            |                    | x           | x        |        |          | x                             |
| 12   |        |         |        |          |            |                    | x           | Write B2 |        |          | x                             |
| 13   |        |         |        |          |            |                    | x           |          |        |          | x                             |
| 14   |        |         |        |          |            |                    |             | Write A2 |        |          | x                             |
| 15   |        |         |        |          |            |                    |             |          |        |          |                               |
| 16   |        |         |        |          |            |                    |             | x        |        |          |                               |
| 17   |        |         |        |          |            |                    |             |          | $\Box$ | Γ        |                               |
| 18   |        |         |        |          |            |                    |             | x        |        | I I      |                               |
|      |        |         |        |          |            |                    | 1           |          |        |          |                               |

|      | FFT Ad | dr Gen | D Addr I | Pipeline |            |                    |             |          | $\prod$ | Mul | tiplier                       |
|------|--------|--------|----------|----------|------------|--------------------|-------------|----------|---------|-----|-------------------------------|
| Step | Instr  | A SEL  | Instr    | SEL      | RAM<br>R/W | Register<br>Enable | PROM<br>SEL | DIO      | ][      |     | Mult                          |
| 1    |        | x      |          |          |            | · x                |             | x        | ЛI      |     |                               |
| 2    |        |        |          |          |            | x                  | x           |          |         |     |                               |
| 3    |        |        |          |          |            | x                  | x           | x        |         |     | x                             |
| 4    |        |        |          |          |            | x                  | x           |          |         | ſ   | x                             |
| 5    |        | 8      |          |          |            | En                 | x           | Read B   |         | ιC  | x                             |
| 6    |        |        |          |          |            | н                  | Re          |          |         | 'Γ  | x                             |
| 7    |        |        |          |          |            | н                  | Im          | Read A   | 7       | Γ   | B <sub>R</sub> W <sub>R</sub> |
| 8    |        |        |          |          |            | н                  | Re          | x        | 7       | Γ   | B <sub>R</sub> W 1            |
| 9    |        | x      |          |          |            | x                  | Im          | x        | 7       |     | B <sub>I</sub> W <sub>R</sub> |
| 10   |        |        |          |          |            | x                  | x           | x        | 7       |     | B <sub>I</sub> W <sub>I</sub> |
| 11   |        |        |          |          |            | x                  | x           | x        |         |     | x                             |
| 12   |        |        |          |          |            | x                  | x           | Write B2 |         | Γ   | x                             |
| 13   |        |        |          |          |            |                    | x           |          | 7       | Γ   | x                             |
| 14   |        |        |          |          |            |                    |             | Write A2 | 7       | Γ   | x                             |
| 15   |        |        |          |          |            |                    |             |          | 7       | Γ   |                               |
| 16   |        |        |          |          |            |                    |             | x        |         | Γ   |                               |
| 17   |        |        |          |          |            | 1                  |             |          |         | Γ   |                               |
| 18   |        |        |          |          |            |                    |             | x        |         | Γ   |                               |

Table 4-11 Coefficient pipeline: pick up 'k' from 29540 and hold for all four products.

Table 4-12 Data-address pipelining: needs 2-level push-down-only stack in an Am29520/21.

|      | FFT Addr Gen |       | D Addr Pipeline |     |            |                    |             |          |        | Mul       | tiplier                       |  |
|------|--------------|-------|-----------------|-----|------------|--------------------|-------------|----------|--------|-----------|-------------------------------|--|
| Step | Instr        | A SEL | Instr           | SEL | RAM<br>R/W | Register<br>Enable | PROM<br>SEL | DIO      |        |           | Mult                          |  |
| 1    |              | x     |                 | x   | x          | x                  |             | x        |        |           |                               |  |
| 2    |              |       |                 |     |            | x                  | x           |          | $\neg$ |           |                               |  |
| 3    |              |       |                 | x   | x          | x                  | x           | x        |        | ιC        | x                             |  |
| 4    |              |       |                 |     |            | x                  | x           |          |        | [         | x                             |  |
| 5    |              | 8     |                 | B1  | R          | En                 | x           | Read B   |        | $1\Gamma$ | x                             |  |
| 6    |              |       |                 |     |            | н                  | Re          |          |        | 7Γ        | x                             |  |
| 7    |              |       |                 | A1  | R          | н                  | Im          | Read A   | 1      | Γ         | B <sub>R</sub> W <sub>R</sub> |  |
| 8    |              |       |                 | x   | x          | н                  | Re          | x        | 7      | Γ         | ₿ <sub>R</sub> ₩ <sub>I</sub> |  |
| 9    |              | x     |                 | x   | x          | x                  | Im          | x        | ]      |           | BIWR                          |  |
| 10   |              |       |                 | x   | x          | x                  | x           | x        |        |           | в <sub>і</sub> w <sub>і</sub> |  |
| 11   |              |       |                 | x   | x          | x                  | x           | x        |        |           | x                             |  |
| 12   |              |       |                 | B2  | w          | x                  | x           | Write B2 |        |           | x                             |  |
| 13   |              |       |                 |     |            |                    | x           |          |        |           | x                             |  |
| 14   |              |       |                 | A2  | w          |                    |             | Write A2 | $\Box$ | Γ         | x                             |  |
| 15   |              |       |                 |     |            |                    |             |          |        |           |                               |  |
| 16   |              |       |                 | x   | x          |                    |             | x        |        |           |                               |  |
| 17   |              |       |                 |     |            |                    |             |          | $\neg$ | Γ         |                               |  |
| 18   |              |       |                 | x   | x          |                    |             | x        |        | Γ         |                               |  |
|      |              |       |                 | 1   | Ť          |                    |             |          |        |           |                               |  |

Table 4-13 Data-address pipeline: pipeline register instructions; 29540 address selection.

|      | FFT Addr Gen |       | D Addr Pipeline |     |            |                    |             |          | / Multiplier |    |                               |
|------|--------------|-------|-----------------|-----|------------|--------------------|-------------|----------|--------------|----|-------------------------------|
| Step | Instr        | A SEL | Instr           | SEL | RAM<br>R/W | Register<br>Enable | PROM<br>SEL | ыо       | ][           |    | Mult                          |
| 1    |              | x     | x               | x   | x          | x                  |             | x        | J            |    |                               |
| 2    |              | x     | x               |     |            | x                  | x           |          |              |    |                               |
| 3    |              |       | x               | x   | x          | x                  | x           | x        |              |    | x                             |
| 4    |              | 1     | Push B          |     |            | x                  | x           |          |              | Γ  | x                             |
| 5    |              | 8     | н               | B1  | R          | En                 | x           | Read B   |              | ΙĽ | x                             |
| 6    |              | 0     | Push A          |     |            | н                  | Re          |          | 1            | 'Γ | x                             |
| 7    |              |       | н               | A1  | R          | н                  | Im          | Read A   | 7            | Γ  | B <sub>R</sub> W <sub>R</sub> |
| 8    |              | x     | x               | x   | x          | н                  | Re          | x        | 7            | Γ  | B <sub>R</sub> W <sub>1</sub> |
| 9    |              | x     | x               | x   | x          | x                  | Im          | x        |              |    | B <sub>I</sub> W <sub>R</sub> |
| 10   |              | x     | x               | x   | x          | x                  | x           | x        | 7            |    | B <sub>1</sub> W <sub>1</sub> |
| 11   |              |       | x               | x   | x          | x                  | x           | x        |              |    | x                             |
| 12   |              | x     | x               | B2  | w          | x                  | x           | Write B2 |              |    | x                             |
| 13   |              | x     | x               | x   | x          | x                  | x           |          |              |    | ×                             |
| 14   |              | x     | x               | A2  | w          | x                  | x           | Write A2 | 7            | Γ  | x                             |
| 15   |              |       | x               | x   | x          | x                  | x           |          |              |    |                               |
| 16   |              |       |                 | x   | x          | x                  | x           | x        |              | Γ  |                               |
| 17   |              |       |                 |     |            |                    | x           |          |              | Γ  |                               |
| 18   |              |       |                 | x   | x          |                    |             | x        |              | Γ  | Γ                             |

Table 4-14 Full address generation code: for one butterfly; still to be overlapped.

|      | FFT Ad | dr Gen | D Addr Pipeline |     |            |                    |             |          | Multiplier |     |   |                               |
|------|--------|--------|-----------------|-----|------------|--------------------|-------------|----------|------------|-----|---|-------------------------------|
| Step | Instr  | A SEL  | Instr           | SEL | RAM<br>R/W | Register<br>Enable | PROM<br>Sel | DIO      | ][         |     |   | Mult                          |
| 1    | x      | х      |                 | x   | x          | x                  |             | x        |            | Ľ   |   |                               |
| 2    | x      | х      | x               |     |            | x                  | x           |          |            |     |   |                               |
| 3    | COUNT  |        |                 | x   | x          | x                  | x           | x        |            |     |   | x                             |
| 4    | н      | 1      | Push B          |     |            | x                  | x           |          |            |     |   | x                             |
| 5    | н      | 8      |                 | B1  | R          | En                 | x           | Read B   |            | ] ' |   | x                             |
| 6    | н      | 0      | Push A          |     |            | н                  | Re          |          |            | /   |   | x                             |
| 7    | x      |        |                 | A1  | R          | н                  | Im          | Read A   |            | '   |   | B <sub>R</sub> W <sub>R</sub> |
| 8    | x      | x      | ×               | x   | x          | н                  | Re          | x        | 7          | ſ   |   | B <sub>R</sub> W <sub>I</sub> |
| 9    | x      | x      |                 | x   | x          | x                  | Im          | x        |            |     |   | в <sub>I</sub> W <sub>R</sub> |
| 10   | x      | x      | x               | x   | x          | x                  | x           | x        |            |     |   | B <sub>I</sub> W <sub>I</sub> |
| 11   | x      |        |                 | x   | x          | x                  | x           | x        |            |     |   | х                             |
| 12   | x      | x      | x               | B2  | w          | x                  | x           | Write B2 |            |     |   | x                             |
| 13   | x      | x      |                 | x   | x          | x                  | x           |          |            |     |   | x                             |
| 14   | x      | x      | x               | A2  | w          | x                  | x           | Write A2 | 7          |     |   | x                             |
| 15   |        |        |                 | x   | X          | x                  | x           |          |            |     |   |                               |
| 16   |        |        |                 | x   | x          | x                  | x           | x        |            | Γ   |   |                               |
| 17   |        |        |                 |     |            |                    | x           |          |            |     |   |                               |
| 18   |        |        |                 | x   | x          |                    |             | x        |            | l   | Π |                               |



ങ





















- c) The format definition for each instruction. It consists of defining the order in which the value of each field is given while writing the microprogram.
- d) The second phase consists of writing the microprogram using the defined language to create a source file (Appendix 3, FFT.SRC).

Another meta-assembler is MACASM by Microtec. Further queries can be directed to: Microtec Research, Inc., 3930 Freedom Circle, #101, Santa Clara, CA 95054, (408) 733-2919.

Table 4-9 shows that the four lines of code for the butterfly are repeated. These are put in a loop and the condition for exiting from the loop is the FFT complete flag from the address sequencer. Some code is required to fill the pipeline and some to flush the pipeline at the end of the process. The following is the sequence of events that would occur from power up.

On power up when the system is reset, the opcode for the JZ instruction is applied to the sequencer as described in the control section. This causes the sequencer to branch to location 0 of the microcode. The op-code for the sequencer at this location is a jump to location 1. This is done to enable a development system to jam address zero on the address bus to emulate the JZ instruction. The PC in the sequencer is updated to the externally produced address by the jump instruction. At location 1, the sequencer waits for the load IR signal to go active. When this happens, the sequencer goes to location 2 and waits for the signal to go inactive. When this happens, the sequencer goes to location 3 and jumps to 1 of 8 locations at the end of the microcode memory. At this location, the start address for the process is obtained and the sequencer jumps to this location. The process is completed and the process complete flag is set. When the CPU acknowledges this flag, the sequencer returns to location 1 and waits again.

The schematics for this design are included in Appendix7.

### 4.2 DIGITAL FILTERS USING MULTIPLY-ACCUMULATOR

A high speed stand-alone digital filter is frequently implemented with a multiplier/accumulator, temporary storage and a state machine which together perform the following calculation

$$y_n = \sum_{k=0}^{M} a_k * x_{n-k} - \sum_{k=1}^{L} b_k * y_{n-k}$$

where xk is a digitized time sample and vk is the output of the filter. This arithmetic can be performed in real time by a microprocessor or single chip signal processor for low frequency applications but a specialized design is required when the input signal frequencies are above 100 kHz (requiring data sampling at >200 kHz). The Am29PL141 Fuse Programmable Controller (FPC) simplifies the design of microcode controlled filters by incorporating all the control functions in a single chip. It contains a microprogram address sequencer and 16 outputs for control of the other circuits in the system. In a digital filter design these outputs manage the temporary storage of previous inputs and outputs and steer the operands to a multiplier/ accumulator.

Figure 4.2.1 is an example of a second order IIR filter (also called a biquadratic filter) using an Am29PL141 FPC, an Am29510 multiplier/accumulator (MAC), 2-Am29520 multilevel pipeline registers, 2-Am27LS19 PROMs and 2-Am29827 buffers. The Am29510 consists of a 16x16-bit parallel multiplier with a 35-bit accumulator. The Am29520 is a set of four 8-bit pipeline registers which can be configured as two shift registers for this design. Any one of the four registers can be selected at the output of the Am29520. The Am27LS19 PROMs are used to hold the coefficients which determine the characteristics of the filter. The negated coefficients are stored for W3 and W4 to simplify the accumulation and one of four sets of coefficients can selected by strapping two address inputs to the PROMs.

The design implements the equation shown above directly although the sequence of calculations is done using the oldest data first for purposes of data management. Appendixes 4, 5, and 6 contain a meta-assembler file for three IIR filters. Each management. appendix contains a definition file and a source file. The definition file defines a set of custom program flow instructions for the filter and also defines the control functions for the output pins. The source file uses the instructions defined in the definition file. Another way to use a meta-assembler is to define a standard set of device instructions (FPC instructions for this design) instead of IIR filter instructions in the definition phase and to keep the customization in the assembly program. This has advantages in a complex design since it eliminates one pass of the meta-assembler when changes or corrections are made and the program is reassembled.

The operation of the filter consists of receiving data from an A/D or a similar filter section, performing the sum of products required for the filter and sending the output to a D/A converter or another filter section. There is signalling for input data taken and output data ready but synchronous operation is assumed and there is no "handshaking" where the device waits until data is ready or taken. This was done because several sections of this type would normally be cascaded as shown in Figure 4-2.2 to obtain the desired frequency response. The signalling enables the sections to synchronize during powerup and could be used for diagnosing faults which cause the system to fall out of synchronization or to resynchronize after such a fault. The sample program simply sets an error flag and stops if data is not available when needed or taken when the calculation is complete. Since only a fourth of the PROM space is needed for the filter, a strapping option can be used for the coefficient addressing to allow a single design to be used for up to four cascaded sections. A filter constructed out of these sections requires five clock periods to produce an output from each input data sample so 10 MHz parts are capable of operating at a 2 MHz sampling rate and can handle input frequencies up to 1 MHz. These figures can be scaled linearly when faster or slower parts are used.

Data management in the Am29520 is accomplished by configuring the device into two shift registers each containing two levels. The A registers hold the two previous input samples and the B registers hold the two previous outputs. The tri-state buffers on the input isolates it while the output data is routed to shift register B. Two FPC outputs are used to control the complementary tristate enables in order to eliminate the need for an inverter.

A variation of this design could be used for a higher order filter by substituting a RAM or register file for the Am29520s. An example of a sixth order IIR filter is shown in Figure 4-2.3. This example is programmed to handshake on input and output since it is not a section of a cascaded filter. In order to obtain a NOP in the MAC during the handshaking, a zero is stored in the coefficient PROM and a zero product is accummulated while waiting for the input data ready or output data taken handshake. In the previous example it was possible to route operands to the MAC on every cycle since the registers could input and output simultaneously. However, this variation uses two

cycles during which no calculations take place to store input and output values in the RAM. These cycles can be reclaimed to increase the throughput by adding hardware to route data to the MAC while writing into the RAM or by selecting write-transparent RAMs which place the data being written on the output during the write cycle. However, this would only be of value if the data flow was synchronous and the input and output handshaking loops could be removed.

The two shift registers required for the filter are emulated by a RAM in which logical addresses represent the position in the shift register. A counter and an adder are used to translate the logical address to a physical address for the RAM. Reading an input data sample increments the counter and "shifts" the data in the RAM. With a minimum of handshaking time, this filter is capable of sampling data at a 600 kHz rate and filtering data containing frequencies up to 300 kHz when operated at 10 MHz.

Because RAM sizes are available in discrete steps, the direct implementation of a sixth order filter is an acceptable choice. However, the 16 RAM storage locations can be used more efficiently if the canonical form of the IIR filter is implemented. This alternate form of the filter can be used whenever the number of previous inputs in the calculation is equal to the number of previous outputs, i.e., M = L for the summation limits. The circuit shown in Figure 4-2.3 can be used to implement an order-15 filter by changing the FPC program to store intermediate calculation values instead of inputs and outputs. The equations to be calculated become

$$Z_{n} = X_{n} + \sum_{k=0}^{M} a_{k} * Z_{n-k}$$

$$y_n = \sum_{k=0}^{M} b_k * Z_{n-k}$$

The intermediate value z is calculated from the input and previous values of z and then entered in the shift register. The output is then calculated from the values of z in the shift register. A zero coefficient is required to allow the MAC to hold an output value during handshaking as in the previous example. These number of calculations required limit the filter to a sampling rate of 300 kHz.

75

and







## CHAPTER 5 ARTICLES

## Record signal-processing rates spring from chip refinements

Improved buses, reconfigurability, pipelining, and parallelism unite in a bipolar family for building array and signal processors

by Bernard New and Lyle Pittroff, Advanced Micro Devices Inc., Sunnyvale, Calif.

□ The number-crunching microprocessor requirements of the 1980s are ill-served by today's comparatively slow, conventional central processing units. Instead, the algorithms executed by both general-purpose array processors and the more specialized digital-signal processors require highly individual architectures for maximum speed and performance. Jumping on the fast track is a new group of bipolar devices—the AM29500 family that combines internal emitter-coupled-logic circuit design for speed with TTL outputs for compatibility with the outside world.

The family is able to overcome such speed-retarding problems as inadequate data-bus memory and bandwidths and slow execution times through a redesigned bus structure and parallel and pipelined processing. In fact, the bus structure is designed so that there are enough parallel buses to keep a device's multiplier or its arithmetic processing unit, or both, busy during each cycle. These features, plus programmable reconfigurability, make the 29500 family the fastest group of large-scale integrated parts for signal processors to be commercially available. In one series of tests, a 29500based system had three times the speed achieved by the older 2900 family.

The 29500 series are general-purpose building blocks. They include a byte-slice, multiple-port programmable signal processor (the 29501), a 16-by-16-bit parallel



Reprinted from ELECTRONICS July 28,1982, copyright 1982 by McGraw Hill,Inc., with all rights reserved

multiplier with programmable input/output (the 29516/17), a multilevel pipeline register for data and address pipelining (the 29520/21), and a fast-Fourier-transform address sequencer (the 29540).

To increase processor speed, architectural enhancements had to be made to the older 2900 device designs. That family took some steps in the right direction because it provides many of the peripheral building blocks, like interface devices and direct-memory-access chips, needed for real-time signal processing. But the 2900's arithmetic devices are targeted at generalpurpose computing. They do not have the parallel channels that are required for a high-speed array or signal processor environment.

One way of satisfying this need was to upgrade the 2900 family's bus structure, number organization, and resource management. The new bus structure can support addition or subtraction and multiplication on every cycle because of extra parallel buses. Number organization can now handle complex numbers in parallel quickly. In addition, flexibility of resource management permits the building blocks to be interconnected in enough ways to support all algorithms of interest efficiently.

For dedicated-function and multiple-algorithm processing (Fig. 1), a special-purpose processor like the 29501 operates under the control of a host computer system that switches large blocks of data between its

> main memory and temporary slave through DMA transfer. Once this transfer is complete, the special-purpose processor operates under local program control. Each algorithm is executed by its own software routine, which is stored in its own local memory independently of the host computer and its high-level language.

> Although the precise architecture of Fig. 1 varies with the algorithm used, all array- and signal-processing algorithms have similar needs for

> **1. Dual-purpose.** In a typical array- or digital-signal-processor architecture, both dedicated and multiple algorithm functions can be implemented. A host computer provides overall guidance and a large memory.

writing it into RAM locations 000000 to 000007.

When the first four memory cycles are over,  $U_3$  goes into the high state and the decoding occurs. The other half of  $U_2$  furnishes a switch-reset pulse when the system has stabilized. This 11-microsecond pulse sets the processor but does not clear the register. Thus, for all reset conditions set by the  $U_2$ -based switch, the vectors must be fetched from the RAM, thereby allowing the operating system to alter them.

Designer's casebook is a regular feature in *Electronics*. We invite readers to submit original and unpublished circuit ideas and solutions to design problems. Explain briefly but thoroughly the circuit's operating principle and purpose. We'll pay \$75 for each time published.

## Generating a negative voltage for portable instruments

by J. D. MCK. Watson, *Biomedical Engineering Research Group,* University of Sussex, Falmer, Brighton, UK

Many recently designed microcomputer-based portable instruments require +5- and -10-volt dc supplies. Though +5 v can be readily derived from a battery supply by means of a linear regulator, the latter needs a special circuit. This flyback converter presents a novel power supply design that uses just one operational amplifier and a few discrete components. The circuit efficiency is about 75% for a load of about 10 milliamperes, and the output voltage can be changed by substituting an alternative zener diode.

Operational amplifier  $U_1$  functions as a currentsensing threshold switch and is capable of providing a wide output-voltage swing. This threshold is adjusted for optimum supply efficiency and output-voltage regulation.  $Q_1$  is driven by the output of  $U_1$  and operates as a saturating switch, with pulse transformer  $T_1$  functioning as its collector load. The transformer is designed for a turns ratio of 1:1 with primary and secondary inductance of 3 millihenrys and a resistance of 1 ohm.

The current in  $T_1$ 's primary through  $Q_1$  provides a

signal to the inverting input of  $U_1$  whose noninverting input is fed from three sources. A portion of the op amp's output provides positive feedback to ensure fast switching, an ac signal from T<sub>1</sub>'s secondary results in the collapse of the flux before recycling, and a dc component tapped from the output lowers the threshold when the output exceeds zener diode D<sub>2</sub>'s breakdown voltage.

When the circuit is switched on,  $U_1$  delivers a high output to  $Q_1$  and turns it on. Current in the primary of  $T_1$  increases linearly, developing a positive voltage at its secondary. This rising primary current also creates a voltage at the inverting input of  $U_1$  that is sufficient to turn it off. As a result, the flux in  $T_1$  collapses and the secondary current charges capacitor  $C_1$ . During this energy transfer,  $R_4$  holds the noninverting input negative and inhibits the switch from turning on.

As subsequent cycles add charge to  $C_1$ , a point is reached when  $D_2$  conducts and inhibits  $U_1$  through  $R_5$ . This stage is disabled until the dc output voltage falls below the zener threshold, whereupon the circuit resumes oscillation. The amplitude of the output voltage is approximately equal to the zener voltage of  $D_2$ . Because of the nonlinear method of regulation, a small amount of ripple is superimposed on the output. For the component values shown, the ripple is of the order of 40 millivolts, but can be reduced by using a RC filter network at the output. Maximum power output is limited by the supply voltage and by the saturation current of  $T_1$ .



**Flyback converter.** This novel flyback converter uses just one op amp,  $U_1$ , pulse transformer  $T_1$ , and a few discrete components to provide a -10-V dc voltage. The supply ripple contents are low and the circuit efficiency is approximately 75%. Zener diode  $D_2$  is used to set the output voltage.

arithmetic and addressing—short, repetitive calculation loops requiring parallelism and pipelining. In addition, in digital-signal processing, arithmetic operations using complex numbers may be necessary, whereupon the computational load increases to twice as many additions or subtractions and four times as many multiplications as for real numbers.

Because calculation loops for arithmetic operations are short, the 29500 family surrounds the additions with continuous memory accesses—data is fetched, the calculation loop performed, and the results written back into memory. Hence there are many times more memory accesses than there are data points. For FFTs, the number of repetitive memory accesses is multiplied by the number of passes through the data. Fortunately, although the memory-access sequence is long, it is well structured, making it possible as a result to design dedicated address sequencers.

### **Divide and rule**

The purpose of pipelining is to allow lengthy operations to be divided into suboperations, so that when one piece of data has completed a suboperation, the same hardware can start on the next piece. In this way, the 29501 allows up to a 500% speed improvement.

For example, because a typical processor handles a set number of algorithms, its architecture can be very specific concerning arithmetic and address generation—no longer does the CPU have to mix addressing with arithmetic computations. Also, separate sections can be streamlined to calculate each type in parallel and fast.

A significant feature of the data path for the 29500 family is the fact that the devices handle only data and do no address calculations. The data path can, therefore, be optimized for arithmetic.

The 29501 multiport parallel processor also represents the current thinking about multiport organization. It has a data-bus port, an output port to a multiplier, and an input port from a multiplier. The chip can process an FFT fast because of its highly parallel internal bus structure. In this structure, six registers operate as pipelines and are connected to the I/O ports and an arithmetic and logic unit by 10 separate byte-wide internal buses.

A typical cycle on the 29501 consists of data input from memory, data output to the multiplier, retrieving a previous product from the multiplier, and register-toregister ALU operations and data moves. Because these operations can occur during the same cycle, data manipulation is limited only by the designer's creativity. This flexibility, plus the possibility of parallel processors operating on complex numbers, is what makes high-speed operation possible.

### **Twice as fast**

The 29500 family uses two high-speed parallel 16by-16-bit multipliers—the 29516 and 29517. The 29516 is compatible with TRW's MPY-16HJ multiplier but is more than twice as fast and has an output multiplexer. Either the least or the most significant product can be selected at this multiplexer output for use in many pipelined architecture calculations.

On the other hand, the 29517 multiplier incorporates

all the features of the 29516 but has a modified I/Oregister clocking structure to provide a single-clock input with register enables. This approach is preferred to the older clock-gating method, which suffers from skews.

### **Dedicated addressing**

Address-sequencing complexity for array and signal processors can range from integer counting to the complicated number patterns of FFTs. To keep addressing speeds high, the 29500 series generates addresses in parallel to the data path. However, other architectural considerations must also be weighed.

For a specific application, several system implications affect the choice of algorithm from the diversity of FFTs available. This choice, together with the transform length (or lengths) to be implemented, determines the address sequence to be generated. Usually, the nestedcount nature of these sequences has forced the designer to use many medium-scale integrated-circuit packages.

The 29540 is a single-chip solution to the addresssequencing problem (Fig. 2). Four control inputs allow programmed or hardwired control of the actual number of data points in the transform. From this and other control-input commands, the 29540 can be sequenced through the entire transform while providing output flags. These flags indicate when each data pass is over and when the entire transform is complete.

For their part, the 29540's control inputs accept the most common FFT formats. The designer can opt for bit-reversed output order or bit-reversed input order, radix-2 or radix-4 address sequences, and decimation-



2. Multiple sequences. Fast Fourier transforms may have unusual address sequences, and with its four control inputs, the address-sequencing 29540 chip is designed to handle all of them. It provides output flags when a calculation is complete.



3. Complete. A typical signal-processing system provides separate, parallel paths for complex data. But in the 29500 setup, address pipelining handles both data and coefficient addressing operations for fast Fourier and other common transforms.

in-frequency or decimation-in-time sequences.

The 16-bit output port of the address sequencer is controlled by the counter and transform-length-input instructions. Any transform from 2 to 65,536 points long can be selected. The higher-order bits not required for the specified transforms (a 1,024-point transform only requires 10-bit addresses) can be preloaded through a bidirectional address port to access the next data block.

### Easy address pipelining

Because the primary objective of this architecture is to operate on array- or signal-processor systems in a highly parallel manner, addresses must also be pipelined. As a result, each address must be tracked, which requires a pipeline register—such as the 29520 or 29521. These are byte-slice pipelining registers configurable as a dual two-level or a single four-level pipeline. In both devices, the single four-level configuration operates as a pushonly stack. The selection of register is determined by the designer's choice of system timing and data movement.

The architecture of a typical 29500 signal-processing

system (Fig. 3) can employ separate parallel data paths for complex data. Three possible address-generator blocks are shown, and together they represent a generalpurpose processor. Address sequences for other than FFTs might be configured from programmable read-only memory or 2901-based designs. Address pipelining is shown for both data and coefficient addresses.

In this design, either bipolar or MOS static randomaccess memories store data temporarily, and high-speed bipolar PROMs and RAMs or MOS ROMs are used for coefficient look-up tables. The local-control store may be either a PROM or a writable control-store RAM and can be controlled by a 2910 program sequencer.

A common benchmark for signal processing is the execution speed of an FFT. The 29500 processor, operating at a 10-megahertz clock rate, can perform the transform in 400 nanoseconds. This speed allows a 1,024-point complex radix-2 butterfly to be completed in 2.0 milliseconds. Compared with the best throughput available in current bit-slice CPU architectures, this figure is more than a twentyfold improvement.

## **D**ESIGN ENTRY

## One-chip sequencer shapes up addressing for large FFTs

The addressing circuitry of a single IC accesses both data and coefficient memories for performing a broad class of fast Fourier transforms.

s one of the most useful algorithms in the digital signal-processing repertoire, the fast Fourier transform provides a quick, orderly, and convenient means of computing the frequency spectrum of a signal. When combined with other operations, the FFT is also useful in correlating or convolving two or more waveforms, techniques required to perform radar, sonar, and image processing.

One of the most difficult problems facing the FFT hardware designer is creating the circuitry to address the memories that hold the data variables and coefficient constants. The difficulty arises partially because of the memory space required and the resulting complexity of either accessing a large number of personalized address tables for each FFT or calling out a large data base in the proper sequence. Even when addressing is done in software, there is the problem of speed—the method is

### **David Quong** and **Robert Perlman** Advanced Micro Devices Inc.

Robert Perlman is a senior product planning engineer with the DSP/array processing group at Advanced Micro Devices in Sunnyvale, Calif. He obtained a BSEE from the Rensselaer Polytechnic Institute and an MSEE from the Johns Hopkins University, and has previously done design work in airborne digital signal processing at Westinghouse.

David Quong is a product planning engineer with the DSP/array processing group. He received a BSEE from California State University in Sacramento. often too slow for real-time applications.

A new chip, however, contains all the addressing circuitry needed to access an FFT unit's data and coefficient memories so that a broad class of functions can be analyzed. The Am29540 programmable address sequencer is flexible enough to generate addresses for FFTs having as few as 2 or as many as 65,536 points. Twelve algorithms are supported in radix-2 and radix-4 systems, including operations on complex and real-valued input data (either inplace or non-in-place transforms); forward and inverse transforms; and decimation-intime (DIT) and decimation-in-frequency (DIF) algorithms.

### A web of nets

Included in the 16-bit sequencer are a butterfly counter (see "Generating Addresses Efficiently," p. 160), a data address generator, and a coefficient address generator (Fig. 1). The butterfly circuit actually has two counters, one for columns and one for rows. The column counter points to the current FFT stage, or column; the row counter, to the butterfly currently being performed within that stage. The counters are programmable and can be initialized to perform transforms of various lengths by prestoring the appropriate 4-bit transform-length code in an on-chip latch. The transform-length code is placed on input lines  $TL_0-TL_3$  and latched with signals

Reprinted with permission from Electronic Design, Vol.32,No.14,Copyright Hayden Publishing Co.,INC.,1984.

### DESIGN ENTRY

### **One-chip FFT sequencer**

Transform <u>Select</u> (TSEL) and Transform Strobe (TSTRB).

The butterfly counter executes one of four instructions: Reset, Reset/Load, Count, and Hold. These instructions are selected with control lines  $I_0$  and  $I_1$  and are executed on the rising edge of the Clock Input line (CP). An FFT is begun by initializing the butterfly counter with a Reset or Reset/Load instruction. The Count and Hold instructions are then

used to advance the counter to the next butterfly operation or to hold it at the present butterfly position.

The counter section generates four flags to help control FFT sequencing. The Iteration Complete flag (IT COMP) indicates the last butterfly operation performed in a stage or column; the last butterfly operation in a particular FFT is signaled by the FFT Complete flag (FFT COMP). The Even/Odd flag changes

### Generating addresses efficiently

A quick look at the structure of a fast Fourier transform reveals why the data and coefficient circuitry is so complex. At the heart of the FFT algorithm is the butterfly operation, which takes its name from the schematic representation that shows how output data is generated from an input waveform.

In the butterfly operation on a radix-2 DIT FFT (Fig. A), two complex data points, A and B, and one complex coefficient are used to compute two new complex data points, A' and B'. The coefficient is a complex exponential of the form  $e^{-j\theta} = \cos \theta - j \sin \theta$ . Each butterfly requires one complex multiplication, one complex addition, and one complex subtraction or four real multiplications, three real additions, and three real subtractions (Fig. B).

An FFT is performed by concatenating butterfly operations. The butterflies are arranged in columns, or stages; an N-point, radix -2 FFT comprises log<sub>2</sub> N stages,



each containing N/2 butterflies, and so a total of (N/2)  $(log_2N)$  operations must be done.

The structure of a 16-point FFT contains 32 butterflies (Fig. C). Each circle represents a single radix-2 butterfly operation. The fractional number accompanying each butterfly of the last three stages is the coefficient needed to perform that butterfly: if the value of the fraction is k, the corresponding coefficient value is  $e^{-j\pi k}$ . Memory locations in which data is stored are represented as blocks; in the case of the 16-point FFT, 16 contiguous memory locations must be allocated to store the 16 complex data points. Each butterfly is performed by taking input points from the data memory, doing the necessary mathematical operations with the appropriate coefficients, and returning the results. The algorithm shown is in-place, meaning that the data points produced by each butterfly are stored in the same locations as the input data points.

The order in which data must be accessed is not straightforward. For the FFT shown, data must be accessed in order 0, 8, 1, 9, ... 7, 15, for the first stage. For the second and following stages, however, data addressing is somewhat more involved. The second stage, for example, is performed by accessing addresses 0, 4, 1, 5, 2, 6, 3, 7, for the first group of four butterflies; then 8, 12, 9, 13, 10, 14, 11, 15, for a second group. The butterflies in the third and fourth stages are also addressed in groups. Stage m has  $2^{m-1}$  groups of N/2<sup>m</sup> butterflies with a group spacing of N/2<sup>m-1</sup>

Coefficients must also be accessed. For the first stage of this FFT, only sin 0 and cos 0 need be acquired. Stage two, however, uses angles 0 and  $\pi/2$ ; stage three uses  $0,\pi/2,\pi/4,$  and  $3\pi/4$ ; and stage four needs angles  $0,\pi/2,\pi/4,\pi/8,5\pi/8,3\pi/8,$  and  $7\pi/8$ . In general, the coefficient address sequence for the mth stage of this FFT is 0, BR(1) $\pi/N$ , BR(2) $\pi/N$ , ... BR ( $2^{m-1}-1$ ) $\pi/N$ , where BR(x) is a function that reverses the order of the bits of a binary word.

A new coefficient must be accessed for each group of butterflies. Other types of FFTs have various addressing sequence requirements, but this example is a good representative. FFT analyzers use several techniques to state after every stage and can be used to control memory operations for non-in-place transforms. The fourth flag, KNZ/KZ, is of special use when performing transforms with real-valued inputs. The last two flags are multiplexed onto a single pin. When the sequencer produces a data address for a transform with a real-valued-input, KNZ/KZ appears on the pin; for any other type of data, Even/Odd appears.

When performing a transform, the sequencer uses its address generators to create addresses combinatorially—that is, the data address generator produces an address for each input and output data point, and the coefficient address generator creates addresses for coefficients and weighting functions. Three control signals—PSD, DIT/DIF, and Radix4/2 —configure the address generators for various types of FFTs. These signals are stored in an

generate these data and coefficient addresses.

One of the most common solutions is to place the data and coefficient addresses in PROMs and then to sequentially address the PROMs with a counter. This approach has several serious drawbacks, however. First the number of data addresses becomes prohibitively large as the size of the FFT grows. A 4096-point, radix-2 FFT with complex inputs, for example, must addresses 24,576 butterflies, each requiring two data addresses and a coefficient address, for a total of 73,728 addresses. Although that number can be reduced by employing constant-geometry FFT algorithms that use the same data addresses for every stage, these algorithms have the disadvantage of being non-in-place, thereby requiring twice the data memory of an in-place transform. The second disadvantage of the PROM approach is that if a single system is to perform several different sizes or types of FFT, a different address table is needed for each FFT.

Another method uses as much SSI and MSI logic as is practical. This approach is easily implemented but usually results in a circuit that consumes considerable board space, is a headache to control, and takes a long time to debug. A circuit for the addressing function in a 4096-point FFT might require 10 to 20 chips.

A third approach is to compute the necessary addresses in software, a method that is often too slow for real-time applications.



### DESIGN ENTRY

### **One-chip FFT sequencer**

on-board latch controlled by the Select and Strobe lines (SEL, STRB). The user selects the desired input data, output data, or coefficient address with control lines Address Select 0 through 3 ( $AS_0-AS_3$ ). The address chosen by lines  $AS_0-AS_3$  is placed on Address lines  $A_0-A_{15}$ . Those lines can be forced to a highimpedance state by the Output Enable signal (OE), thus allowing other address generation devices to be tied to the same address bus.

### **Conserving memory**

When addressing data, the sequencer can generate as many as  $2^{16}$  addresses; the actual number needed for a particular FFT depends on the size and type of the transform. An Npoint, radix-2, in-place FFT with inputs that are complex quantities, for example, must address N complex-data points during each stage. If N is 16, only 16 memory locations need be addressed, leaving much of the available address space unused.



1. The Am29540 offers a one-chip solution to the problem of addressing data and coefficient memories for performing fast Fourier transforms. The butterfly counter can be programmed to address anywhere from 2 to more than 65,000 points.

The Am29540 offers two data-addressing options for the user who needs less than 64 kwords of space. The first sets the unused upper address bits to zero by initializing the butterfly counter with the Reset instruction. For a 16-point transform, then, the upper 12 address lines would contain 0s for any data address. The four remaining lines are available to call  $2^4$ , or 16, values. The other option is to program the upper address lines to a userselected value to address a given data block in a large memory. The upper data bits are programmed by bringing OE high, placing the desired bit pattern on address lines  $A_0 - A_{15}$ , and then executing the butterfly counter's Reset/ Load instruction. If, for example, the bit pattern  $ABC0_{16}$  is used to initialize a 16-point, complex-input FFT, the sequencer will address a block of sixteen data locations beginning at address ABC0<sub>16</sub>.

Non-in-place transforms present additional problems. Unlike in-place transforms, non-inplace algorithms cannot store the output data from a butterfly operation in the same locations previously occupied by the input data. That problem is overcome by generating both the input and output data addresses for such transforms.

Typically, non-in-place transforms are performed with two data memories, one the source of input data, the other the destination for output data. When a butterfly operation for a given stage is completed, the roles of these memories are reversed, with the output data memory of one stage providing the input data for the next. The Even/Odd signal is particularly useful in such cases; since it changes state after every stage, it can be used to control the direction of data flow between the two memories.

### Getting the coefficients

To access coefficients, the Am29540 generates a 16-bit address corresponding to one of  $2^{16}$ equally-spaced angles between 0 and  $2\pi$  radians. For coefficient address A, the angle addressed is  $2\pi A/2^{16}$  the angle  $\pi/2$ , for instance, would have the address  $4000_{16}$ . The coefficient address is fed to look-up memory, usually PROM, containing sine and cosine values for the angles selected. A given FFT will use some subset of the more than 65000 angles available. As a case in point, an N-point, radix-2 FFT with complex inputs must access only N/2 equally spaced angles in the range 0 to  $\pi$  radians; a 16-point FFT, then, needs only eight different angles. The sequencer automatically chooses the angles needed in the proper sequence, skipping over unused values.

The coefficient-addressing scheme employed carries a significant benefit for systems in which various sizes of FFTs are to be implemented. Because the chip automatically accesses only those sine and cosine values needed, a single sine/cosine table can be used to perform FFTs of various sizes. If, for example, the user creates a look-up table containing 2048 sine and cosine values between 0 and  $\pi$  radians, that table can be used to perform all radix-2 complex FFTs with 4096 or fewer points.

Most FFT algorithms currently in use are designed to process complex input data. The Am29540 supports 12 different types of this transform (see the table, below). The choices include:

•Radix-2 or radix-4 transforms. The butterfly structure of a radix-4 transform is more complicated than that of radix-2 but offers somewhat greater computational efficiency. Each radix-4 butterfly produces four output data points from four input data points and three coefficients, and consumes 12 real multiplications and 22 real additions. Radix-4 transforms are selected with the Radix4/2 signal.

•Decimation-in-time or decimation-in-frequency transforms. These terms refer to two basic classes of FFTs; they reflect the manner in which each class is derived. DIT and DIF

| Fas                | t Fou | rier transf        | orms supp                 | orted by               | the Am29                | 540                       |
|--------------------|-------|--------------------|---------------------------|------------------------|-------------------------|---------------------------|
| Input data<br>type | Radix | Decimation<br>type | In-place/<br>non-in-place | Input data<br>ordering | Output data<br>ordering | Direction of<br>transform |
|                    | 2     | DIF                | In-place                  | Normal                 | Digit-<br>reversed      | Forward and inverse       |
|                    | 2     | DIF                | In-place                  | Digit-<br>reversed     | Normal                  | Forward and<br>inverse    |
|                    | 2     | DIF                | Non-in-place              | Normal                 | Normal                  | Forward and<br>inverse    |
|                    | 2     | DIT                | In-place                  | Normal                 | Digit-<br>reversed      | Forward and<br>inverse    |
|                    | 2     | DIT                | In-place                  | Digit-<br>reversed     | Normal                  | Forward and<br>inverse    |
| Complex            | 2     | DIT                | Non-in-place              | Normal                 | Normal                  | Forward<br>inverse        |
|                    | 4     | DIF                | In-place                  | Normal                 | Digit-<br>reversed      | Forward and<br>inverse    |
|                    | 4     | DIF                | In-place                  | Digit-<br>reversed     | Normal                  | Forward and<br>inverse    |
|                    | 4     | DIF                | Non-in-place              | Normal                 | Normal                  | Forward and<br>inverse    |
|                    | 4     | DIT                | In-place                  | Normal                 | Digit-<br>reversed      | Forward and<br>inverse    |
|                    | 4     | DIT                | In-place                  | Digit-<br>reversed     | Normal                  | Forward and inverse       |
|                    | 4     | DIT                | Non-in-place              | Normal                 | Normal                  | Forward and inverse       |
| Real-valued        | 2     | DIT                | In-place                  | Normal                 | Unique                  | Forward                   |
| (RVI)              | 2     | DIF                | In-place                  | Unique                 | Normal                  | Inverse                   |

### **DESIGN ENTRY**

### **One-chip FFT sequencer**

butterfly structures differ somewhat but require an identical <u>number</u> of arithmetic operations. The DIT/DIF signal determines the desired transform type.

•In-place or non-in-place transforms. Non-inplace transforms require twice the data memory of their in-place counterparts. It might seem, then, that in-place transforms would always be preferred. Unfortunately, the in-place approach has a drawback—the digits of the address of the input or output data must be called for in reversed order. This scheme requires a reordering operation. The choice between inplace and non-in-place algorithms is made by using the appropriate values of  $A_0$ - $AS_3$  to select the desired addresses. Should the user select an in-place transform, the choice of digit-reversed-address input or output can be made with the signal PSD.

### **Useful inversions**

The sequencing chip also can be used to perform inverse transforms, a useful feature in applications requiring a route from the frequency to the time domain. Computing inverse transforms is straightforward—the address sequences needed are the same as those for the forward operations. With radix-2 transforms, the only difference between the inverse and forward transforms is the complex exponential:  $e^{-j\theta}$  must be replaced with  $e^{j\phi}$ . Changing the sign of the complex exponential's argument is equivalent to replacing the coefficient sin  $\theta$  with  $-\sin \theta$ , an operation that can be executed by slightly modifying the addition and subtraction operations performed in the butterfly. Radix-4 inverse transforms require somewhat similar minor accommodations to sign changes in the butterfly calculation.

Some applications demand FFT transforms with real-valued inputs. The sequencer generates data addresses for both forward and inverse real-valued-input (RVI) transforms of a type first described by Bergland.<sup>1</sup>

### A weighty matter

FFT filter characteristics can often be significantly improved by premultiplying the input data with a series of weighting factors. This technique, also called windowing, or shading, can significantly lower filter sidelobes and thus simplify the analysis. The properties of a number of common weighting functions are well-documented.<sup>2</sup>

The sequencer supports two weighting approaches for radix-2 transforms. The first and



2. The Sequencing operation for a 16-point FFT begins by loading the appropriate transform-length code and control signals into on-board latches. The butterfly counter is then reset. After initialization, the first butterfly's memory addresses are selected with lines AS<sub>0</sub>-AS<sub>3</sub>. The sequencer is then advanced to each succeeding butterfly using the count instruction.

### **One-chip FFT sequencer**

simplest approach is to perform a weighting prepass before the FFT begins.

The sequencer is programmed to perform the first stage of a radix-2 DIF transform. The resulting prepass data addresses access the input data, the coefficient addresses access weighting values stored in a look-up table. On completion of the prepass, the part is reprogrammed to address the type of radix-2 FFT desired.

The second approach takes advantage of the structure of a DIT FFT. For the first stage of the transform, only the coefficient values sin 0 and cos 0 are needed. Weighting can thus be incorporated in this first stage, using the stage's multiplier. By configuring the part to perform a radix-2, DIF FFT for stage 1, and then changing the FFT type from DIF to DIT for all remaining stages, the necessary data, weighting, and coefficient addresses can be generated.

### A look at an FFT

Virtually all useful weighting functions are symmetrical. If Y(n) is a symmetrical N-point weighting function, point Y(x) is equal to Y(N-x). This symmetry implies that the user need not store all N points of the weighting function: (N/2)+1 points are sufficient. The sequencer addresses such half tables by generating both x and N-x. The often-used von Hann weighting function is one such example,



3. In a typical system, the Am29540 is used to access data, weighting, and sine and cosine (coefficient) values in a microcode-controlled FFT processor. This system can support a 4-kpoint radix-2 transform.

### **One-chip FFT sequencer**

easily derived from the table of cosines required by the FFT algorithm itself. Thus, the need for a separate weighting-function memory is altogether eliminated.

The sequencer's operation can be best understood by considering its performance of a typical FFT. Suppose, then, that an in-place, radix-2, 16-point DIT FFT is to be implemented (see "Generating Addresses Efficiently," Fig. C, p. 160). To initialize the device, the appropriate transform-length code and control bits are loaded into the on-chip latches. For this example, the transform-length code has the value  $0011_2$ ; the control bits must assume the values PSD = 1, Radix4/2 = 0, and DIT/DIF = 1. After this data has been entered, the butterfly counter is initialized with a Reset or Reset/ Load instruction.

Once initialized, the part generates data and coefficient addresses for the FFT's first butterfly. For this algorithm, the input and output data addresses are set at 0 and 1, respectively, with lines  $AS_0-AS_3$ ; the coefficient address is similarly set to 8. After all the addresses have been read, the device is advanced to the next butterfly by executing a count instruction (Fig. 2).

### Defining the system

The Am29540's working environment is a microcode-driven FFT processor. That system can be divided into several basic blocks (Fig. 3): the address sequencer, arithmetic processor, high-speed data memory and coefficient memory, system controller, and the host interface.

The address computer generates the read and write addresses to access data, as well as coefficient and weighting addresses.

The arithmetic processor, consisting of a multiplier (here, the Am29517) and two ALUs (one for real and the other for imaginary data), efficiently calculates complex data from the data memory. Using coefficient and weighting generators, it processes the information and returns it to the data memory. The data width is 16 bits; therefore each ALU requires two 8-bit-slice multiport pipelined processors (here, Am29501s). A scaling shifter is provided in each data path from the memory to the ALUs.

The high-speed data memory stores input

and output data from an FFT operation. It is also divided into two banks, one for real, one for imaginary data. The sequencer can be loaded with a data address offset, allowing data memory to be addressed at starting locations other than zero, and permits the addressing of selected blocks of data. A set of coefficient generators (Am29526/27/28/29) provide the coefficients needed when performing an FFT and produce up to 2048 words of sine and cosine data. This is sufficient to support up to a 4096-point, radix-2 transform. A PROM contains the weighting values for the FFT input data.

The system controller, as overseer, accepts instructions through the host computer interface, determines which function must be performed, issues the proper instructions to other components, and informs the host when the operation is done. It employs a microsequencer (the Am2910A) and microcode memory.

The host interface consists of logic to handle the host system protocols and a DMA controller for high-speed data transfer. During block data transfer the DMA circuitry has direct access to the data memory.

The address sequencer generates both read and write addresses for the data memory. When, as is usual, the operations for a sequence of butterflies are overlapped, those addresses must be temporarily stored in an agile shiftregister pipeline. This structure must unravel the intertwined sequence of addresses for the several butterflies that are in progress at any given time. Here, a multilevel pipeline register consisting of two Am29520s is used. It can serve as dual two-level or a single four-level pipeline register, and each of the registers is available to the output at any time.  $\Box$ 

### References

1. G. D. Bergland, "A Fast Fourier Transform Algorithm for Real-Valued Series," *Communications of the ACM*, Vol. 11, Number 10, October 1968, pp. 703-710.

2. F. J. Harris, "On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform," *Proceedings of the IEEE*, Vol. 66, Number 1, January 1978, pp. 51-83.

# 500-kHz single-board FFT system incorporates DSP-optimized chips

VLSI devices optimized for digital signal processing can realize architectures that, compared with traditional designs, save space, power and money. Such chips serve as the basis for a singleboard system that uses fewer than 40 standard components.

### **Robert Cohen** and **Robert Perlman**, Advanced Micro Devices Inc

By employing VLSI devices to implement the fast Fourier transform, you can build a single-board digitalsignal-processing system that supports sampling rates to 500 kHz and requires fewer than 40 packages (including processor, sequencer and local memory).

In such systems, the FFT makes possible many applications that would otherwise be unrealizable because of computational complexity. FFT techniques require a great number of calculations, and generalpurpose computers incorporating the FFT aren't fast enough for such real-time high-bandwidth signal-processing systems as radar, video processing and telecommunications. Until the introduction of VLSI devices that are optimized for DSP tasks, only expensive array processors and special-purpose systems constructed with hundreds of SSI and MSI components could serve such applications.

### **Optimize butterfly execution**

Effectively applying these VLSI circuits requires a familiarity with the FFT's computational requirements (see **box**, "FFTs reduce DFT computations"). Then, you can implement an appropriate algorithm in hardware. Because the FFT's basic operation is the butter-fly, you can start by designing a butterfly processor.

Fig 1 lists the steps required to process a butterfly. The list helps you to determine the minimal resources required: an ALU, a multiplier and enough memory to hold the real and imaginary components of N samples. By adding resources, you can increase parallelism and boost throughput. For example, separate memories for the real and imaginary components of the sample data allow you to read A or B (or write A' or B') in one cycle. Extending this concept, you can divide the data pathway into a real-variable processor and an imaginaryvariable processor (**Fig 2**).

The multiplier (or set of multipliers) acts as a shared



Fig 1—Five sequential steps implement the butterfly, which is the primitive DFT operation. This list shows that the absolute minimum resources required to implement a butterfly are an ALU, a multiplier and memory.

Reprinted with permission from EDN, October 31,1984

resource for both processors because it operates on both types of data. Each processor consists of an ALU and registers that hold intermediate results. Although you could add more ALUs, they prove superfluous for the FFT algorithm used here. (Additional ALUs are useful in radix-4 algorithms; see **reference**.)

To achieve the best performance, you minimize the number of cycles needed to execute the butterfly. Parallel computations allow the processor to accomplish more in each cycle to effect the desired reduction.

With an architecture like the one suggested—two memories, two processors and several multipliers what is the smallest number of required cycles? To find out, examine **Fig 1** and start by using two cycles to read A and B from memory. (You can store W<sup>k</sup> in a PROM and read it concurrently with A and B.) Assuming the processor has four multipliers, step 2 executes in one cycle. Step 3 also executes in one cycle if it determines



Fig 2—Separate real and imaginary data pathways allow you to share multipliers and reduce required system resources.

|   | DATA<br>BUS | REAL ALU                                                                                                 | IMAGINARY ALU                                                                                            | MULTIPLIERS                                                                                                             |
|---|-------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| 1 | A           |                                                                                                          |                                                                                                          |                                                                                                                         |
| 2 | в           |                                                                                                          |                                                                                                          |                                                                                                                         |
| 3 |             |                                                                                                          |                                                                                                          | B <sub>R</sub> W <sup>K</sup> B <sub>I</sub> W <sup>K</sup> B <sub>R</sub> W <sup>K</sup> B <sub>I</sub> W <sup>K</sup> |
| 4 |             | B <sub>R</sub> W <sup>k</sup> – B <sub>I</sub> W <sup>k</sup>                                            | B <sub>i</sub> W <sup>k</sup> <sub>R</sub> + B <sub>R</sub> W <sup>k</sup>                               |                                                                                                                         |
| 5 |             | $A_R + B_R W_R^k - B_I W_I^k$                                                                            | $A_i + B_i W_R^k + B_R W_I^k$                                                                            |                                                                                                                         |
| 6 |             | A <sub>1</sub> - B <sub>1</sub> W <sup>k</sup> <sub>R</sub> - B <sub>R</sub> W <sup>k</sup> <sub>i</sub> | A <sub>R</sub> - B <sub>R</sub> W <sub>R</sub> <sup>k</sup> + B <sub>I</sub> W <sub>I</sub> <sup>k</sup> |                                                                                                                         |
| 7 | A'          |                                                                                                          |                                                                                                          |                                                                                                                         |
| 8 | Β′          |                                                                                                          |                                                                                                          |                                                                                                                         |

Fig 3—An 8-step butterfly, which implements the algorithm prior to optimization, uses its constraining resource—the data bus—only 50% of the time.

the left-column difference in the real ALU and the right-column sum in the imaginary ALU. Step 4 requires two cycles, the left two operations being performed in the real-variable ALU and the right two in the imaginary-variable ALU. A' and B' are then written to memory in two cycles. This process executes a complete butterfly in eight cycles.

You now estimate how fast this processor can operate. Assuming that N=1024, the processor must perform  $(N(\log_2)N)/2$  butterflies (a total of 5120). At eight cycles per butterfly, the processor needs 40,960 cycles. Next, assume a 100-nsec cycle time. (Cycle time depends on the slowest pathway through the system, which is typically via the multiplier; 16×16-bit combinatorial multipliers with sub-100-nsec propagation delays are common.) Under these conditions, a 1k-sample transform requires 4 msec, corresponding to a 0.25-MHz sampling rate, which is quite respectable for many applications. Further scrutiny will reveal ways to reduce hardware and increase throughput.

#### Less hardware does the job faster

Fig 3 shows a resource-utilization table for an 8-step butterfly. Note that all resources are idle most of the time: The data bus is active only 50% of the time, the ALUS 38%, and the multipliers 13%. You can take advantage of this idle time by executing butterflies concurrently, a technique known as pipelining.

For example, after reading A and B for the first butterfly, the data bus can read A and B for the second butterfly during cycles 3 and 4 while the multipliers and ALUs are busy. The multiplier could then begin working on the second butterfly immediately after computing results for the first. Using this technique, the processor still requires eight cycles to complete a



Fig 4—By starting a second butterfly concurrently, you can create a 5-cycle loop and improve throughput 38% compared with Fig 3's operation.

particular butterfly, but it reduces the average number of cycles per butterfly because it works on more than one butterfly at a time.

The most heavily used resource determines the minimum average number of cycles per butterfly that you can achieve. By using the four idle bus cycles, you can reduce the average number of cycles per butterfly to four and double system throughput.

You can see this doubling of throughput clearly in **Fig** 4's resource-utilization table. A second concurrent but-



Fig 5—Data-bus utilization is 100% in a 4-cycle butterfly. Here you can see that using just one multiplier doesn't hinder throughput.



Fig 6—This FFT circuit uses only one multiplier and a handful of other components.

terfly starts on cycle 5; A and B are then read, and the new products are calculated in cycle 7. (You can also start the second butterfly on cycle 3 or 4.) After completing cycle 8, the processor jumps to cycle 4 instead of cycle 1, because it has already read A and B and computed the new products for the second butterfly. This technique creates a 5-cycle loop instead of an 8-cycle loop, improving throughput by 38%.

Fig 5's table shows how to achieve even higher performance. You can copy cycle 4 into cycle 8, which allows the processor to jump to cycle 5 and produce a 4-cycle loop. This action doubles the original throughput. In this case, the data bus experiences 100% utilization and the ALUS 75%, but the multipliers are still employed only 25% of the time. Clearly, you don't need four multipliers. In fact, you can achieve the same performance with only one multiplier by pipelining an additional butterfly. A design example demonstrates this technique.

### Start a butterfly every four cycles

The **Fig 6** design uses a real-variable processor and an imaginary-variable processor, each with two Am29501s to provide 16-bit precision. (The Am29501 is an 8-bit, cascadable processor comprising an ALU, a set of six registers, and three data ports.) The two processors also share an Am29517 16-bit parallel multiplier, which has two 16-bit inputs, X and Y. The Y input connects to the multiplier I/O (MIO) port on the real and imaginary 29501s; the X input is driven either by a PROM containing the complex constants W<sup>k</sup> or by Am29526/27/28/29 sine/cosine generators. The high-order 16 bits of the multiplier output (P<sub>1631</sub>) go to the 29501s' multiplier input (MI) ports, while the low-order 16 bits of the product are ignored. Memory consists of static RAM with a cycle time of less than 100 nsec.

The microcode needed to perform one butterfly is 10 cycles long (Fig 7a), but you should note two things. First, registers are never used for more than four cycles, so the processor can load them with new values every four cycles. This, in turn, means that it can start a new butterfly every four cycles.

Second, you can superimpose each line of code onto the line four cycles below it without causing resource conflicts. For example, **Fig 7b**'s code superimposes lines 1 through 4 over lines 5 through 8 to start computing a second butterfly while the first is still executing. This process repeats in **Fig 7c**'s code, where lines 5 through 8 are then superimposed over lines 9 through 12. These last four lines contain the code necessary to compute three concurrent butterflies. You must ensure only that, when the processor reads or writes A or B, it knows exactly to which butterfly the data applies.

### FFTs reduce DFT computations

Fourier transforms mathematically approximate a signal's transformation from the time domain to the frequency domain, and several algorithms implement the technique. All are based on the discrete Fourier transform (DFT), which sums time-domain samples (x(n)) that are multiplied by complex constants:

$$X(k) = \sum_{n=0}^{N-1} x(n) W^{2uk/N}$$
  
k = 0, 1, ..., N - 1,

where  $W = e^{-j\pi}$  and each X(k) is a frequency-domain Fourier coefficient. The computation of each coefficient requires N complex multiplications, where N is the number of samples. This results in N<sup>2</sup> complex multiplications.

The fast Fourier transform (FFT) reduces complex multiplications by eliminating redundant calculations, using the equation

$$X(k) = G(k) + W^{2k/N} H(k)$$
  
k = 0, 1, ..., N - 1, (1)

where G(k) is the DFT of the even samples in x(n), and H(k) is the DFT of the odd samples. (The algorithm discussed here is a radix-2 decimation-in-time algorithm; other schemes may provide additional benefits.)

### Shaving points

Based on **Eq 1**, **Fig A** shows an 8-point DFT that's divided into two 4-point DFTs, one of which operates on even samples while the other operates on odd ones. As **Fig A** shows, the results are summed to produce the 8-point DFT result. This configuration takes advantage of the fact that G(k) and H(k) have period N/2. In other words,

$$G\left(k + \frac{N}{2}\right) = G(k)$$
$$H\left(k + \frac{N}{2}\right) = H(k).$$

Each 4-point DFT requires  $N^2=16$  complex multiplications, and combining the intermediate results to obtain the eight frequency-domain coefficients requires one complex multiplication for each coefficient (the arrows represent multiplication by the noted constant). Thus, the **Fig A** transform requires a total of 40 (16+16+8) complex multiplications required to compute an 8-point DFT directly.

By repeating this process and dividing the 4-point DFTs into 2point DFTs, you can eliminate even more computations. The 8point FFT represented in **Fig B** requires eight complex multiplications for the four 2-point DFTs plus 16 other complex multiplications, for a total of 24. In general, the number of complex multiplications equals the number of columns in the representation (log<sub>2</sub>N) times the number of samples.

Another technique allows you to cut multiplications in half again. In **Fig C**, each circle represents a



Fig A—Compared with direct computation of an 8-point DFT, decomposing it into two 4-point DFTs saves 24 complex multiplications.



Fig B—This scheme, which uses four 2-point DFTs to transform eight time-domain samples, requires a total of 24 complex multiplications: eight for the four 2-point DFTs plus the 16 represented here by arrows.

sum and a difference. Using the structure in Fig D, you define A' and B' as follows:

This notation results because W is a complex exponential and therefore periodic:

$$W^{2k/N} = -W^{2(k + \frac{N}{2})/N}$$
.

Because you can use the product  $B \times W^{2k/N}$  to calculate both A' and B', the total number of complex multiplications drops to (Nlog<sub>2</sub>N)/2. This structure is the primitive operation in FFT calculations and is called a butterfly operation. Note that each circle in **Fig C** is a butterfly operation. This fact suggests a pipelined operation, optimized to execute butterflies, that can exploit the algorithm's highly repetitive nature.

### Dissecting the butterfly

Each butterfly consists of two calculations:

$$\mathbf{A}' = \mathbf{A} + \mathbf{B}\mathbf{W}^{\mathbf{k}}$$
$$\mathbf{B}' = \mathbf{A} - \mathbf{B}\mathbf{W}^{\mathbf{k}}$$

where A', B', A, B and W<sup>k</sup> are complex numbers. (Here, the exponent 2k/N is consolidated into one term, k, for simplicity.) Dividing these into their real and imaginary parts yields

$$\begin{array}{l} \mathsf{A}' \ = \ (\mathsf{A}_{\mathsf{R}} \ + \ j \ \mathsf{A}_{\mathsf{I}}) \\ \ + \ (\mathsf{B}_{\mathsf{R}} \ + \ j \ \mathsf{B}_{\mathsf{I}})(\mathsf{W}_{\mathsf{R}}{}^{\mathsf{k}} \ + \ j \ \mathsf{W}_{\mathsf{I}}{}^{\mathsf{k}}) \\ \mathsf{B}' \ = \ (\mathsf{A}_{\mathsf{R}} \ + \ j \ \mathsf{A}_{\mathsf{I}}) \\ \ - \ (\mathsf{B}_{\mathsf{R}} \ + \ j \ \mathsf{B}_{\mathsf{I}})(\mathsf{W}_{\mathsf{R}}{}^{\mathsf{k}} \ + \ j \ \mathsf{W}_{\mathsf{I}}{}^{\mathsf{k}}). \end{array}$$

Expanding the products gives

$$\begin{array}{l} A' &= (A_{R} + j A_{I}) + (B_{R} W_{R}^{k} + \\ & j B_{I} W_{R}^{k} + j B_{R} W_{I}^{k} - B_{I} W_{I}^{k}) \\ B' &= (A_{R} + j A_{I}) - (B_{R} W_{R}^{k} + \\ & i B_{I} W_{P}^{k} + i B_{P} W_{I}^{k} - B_{I} W_{I}^{k}) \end{array}$$

Dividing A' and B' into their real



Fig C—In this DFT representation, intermediate results are multiplied by the complex constant  $W^{\mu\nu}$  and then summed. Note that each sum of products, illustrated by a circle, is actually a butterfly operation.



Fig D—The 2-point DFT, or butterfly, is the primitive operation in FFT calculations. Fig C includes many such operations and thus lends itself to a pipelined implementation.

and imaginary parts yields four equations:

 $\begin{array}{l} A_{R}{'} = A_{R} + B_{R} \ W_{R}{^{k}} - B_{I} \ W_{I}{^{k}} \\ B_{R}{'} = A_{R} - B_{R} \ W_{R}{^{k}} + B_{I} \ W_{I}{^{k}} \\ A_{I}{'} = A_{I} + B_{I} \ W_{R}{^{k}} + B_{R} \ W_{I}{^{k}} \\ B_{I}{'} = A_{I} - B_{I} \ W_{R}{^{k}} - B_{R} \ W_{I}{^{k}} \\ \end{array}$ 

These equations share common terms that need be calculated only once per butterfly. Regrouping terms yields additional savings:

 $\begin{array}{l} A_{\mathsf{R}}{'} = A_{\mathsf{R}} + (B_{\mathsf{R}} \; W_{\mathsf{R}}{^{\mathsf{k}}} - B_{\mathsf{I}} \; W_{\mathsf{I}}{^{\mathsf{k}}}) \\ B_{\mathsf{R}}{'} = A_{\mathsf{R}} - (B_{\mathsf{R}} \; W_{\mathsf{R}}{^{\mathsf{k}}} - B_{\mathsf{I}} \; W_{\mathsf{I}}{^{\mathsf{k}}}) \\ A_{\mathsf{I}}{'} = A_{\mathsf{I}} + (B_{\mathsf{I}} \; W_{\mathsf{R}}{^{\mathsf{k}}} + B_{\mathsf{R}} \; W_{\mathsf{I}}{^{\mathsf{k}}}) \\ B_{\mathsf{I}}{'} = A_{\mathsf{I}} - (B_{\mathsf{I}} \; W_{\mathsf{R}}{^{\mathsf{k}}} + B_{\mathsf{R}} \; W_{\mathsf{I}}{^{\mathsf{k}}}). \end{array}$ 

You can now determine the number of calculations necessary per butterfly: four multiplications to compute  $B \times W^{\kappa}$ , a subtraction to calculate real A' and B', an addition to calculate imaginary A' and B', two final additions for A', and two final subtractions for B'. This process yields a total of four products, three additions and three subtractions per butterfly.

A' = A + B' = A -

A' = A + B' = A - D'

|    |      |          | REAL                |    |                |    |                |                | IMAGINARY |                     |                |                |    |    |                |    | MULTIPLIER |                               |  |
|----|------|----------|---------------------|----|----------------|----|----------------|----------------|-----------|---------------------|----------------|----------------|----|----|----------------|----|------------|-------------------------------|--|
|    | STEP | DIO      | ALU                 | A, | A <sub>2</sub> | A3 | B <sub>1</sub> | B <sub>2</sub> | В3        | ALU                 | A <sub>1</sub> | A <sub>2</sub> | A3 | B, | B <sub>2</sub> | Вз | MIO        | MULT                          |  |
|    | 1    | READ B   |                     |    |                |    | DI             |                |           |                     |                |                |    | DI |                |    |            |                               |  |
|    | 2    |          |                     |    |                |    | Π              |                |           |                     |                |                |    |    |                |    | BR         |                               |  |
|    | 3    | READ A   |                     | Dİ |                |    |                |                |           |                     | Dİ             |                |    |    |                |    |            | B <sub>R</sub> W <sub>R</sub> |  |
| BK | 4    |          | A <sub>1</sub> – MI | Π  |                | MI |                | AĻU            |           |                     |                |                |    | Π  |                |    | B          | B <sub>R</sub> W <sub>I</sub> |  |
| BK | 5    |          |                     |    |                |    |                |                |           | A <sub>1</sub> – MI |                |                | MI |    | ALU            |    |            | B <sub>I</sub> W <sub>R</sub> |  |
|    | 6    |          | $A_1 + A_3$         | Т  | AĻU            | Π  |                |                |           | $B_2 - MI$          |                |                |    |    | AĻU            | МІ |            | B, W,                         |  |
|    | 7    |          | $B_2 + MI$          |    |                |    | MI             | AĽU            |           | $A_{1} + B_{3}$     |                | AĻU            |    |    |                |    |            |                               |  |
|    | 8    | WRITE B2 |                     |    | Π              |    |                |                | Γ         | $A_{2} + A_{3}$     |                | AĻU            |    |    |                |    |            |                               |  |
|    | 9    |          | $A_2 - B_1$         |    | AĽU            |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 10   | WRITE A2 |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 11   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 12   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 13   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 14   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 15   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 16   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 17   |          |                     |    |                |    |                |                |           |                     |                |                |    |    |                |    |            |                               |  |
|    | 18   |          |                     |    |                |    |                |                |           |                     |                |                |    | Ľ  | I              |    |            |                               |  |

(b)

|      |      |          |                                 |    | REAL           | L  |    |    |    |    |                     | 11 | MAGIN          | AAR | Y              |                |                | MUL            | TIPLIER                       |
|------|------|----------|---------------------------------|----|----------------|----|----|----|----|----|---------------------|----|----------------|-----|----------------|----------------|----------------|----------------|-------------------------------|
|      | STEP | DIO      | ALU                             | Α, | A <sub>2</sub> | A3 | B, | B  | 32 | B3 | ALU                 | A, | A <sub>2</sub> | A3  | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | MIO            | MULT                          |
|      | 1    | READ B   |                                 |    |                |    | ΡI |    |    |    |                     |    |                |     | DI             |                |                |                |                               |
|      | 2    |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                | B <sub>R</sub> |                               |
|      | 3    | READ A   |                                 | ΡI |                |    |    |    |    |    |                     | ٦I |                |     |                |                |                |                | B <sub>R</sub> W <sub>R</sub> |
| BK   | 4    |          | A <sub>1</sub> – MI             |    |                | MI |    | AL | ĻŪ |    |                     |    |                |     | T              |                |                | В              | B <sub>R</sub> W <sub>I</sub> |
| - BK | 5    | READ B   |                                 |    |                |    | ΡI |    |    |    | A <sub>1</sub> – MI |    |                | MI  | ΡI             | ALU            |                |                | B <sub>I</sub> W <sub>R</sub> |
|      | 6    |          | $A_1 + A_3$                     |    | AĻU            |    |    |    |    |    | B <sub>2</sub> – MI |    |                |     |                | AĻU            | МІ             | B <sub>R</sub> | B, W,                         |
|      | 7    | READ A   | $B_2 + MI$                      | Ы  |                |    | MI | A  | Ĺυ |    | $A_1 + B_3$         | ΡI | AĻU            |     |                |                |                |                | B <sub>R</sub> W <sub>R</sub> |
|      | 8    | WRITE B2 | A <sub>1</sub> – MI             |    |                | МI | Π  | A  | ĻU |    | $A_2 + A_3$         |    | AĻU            |     | Π              |                |                | В              | B <sub>R</sub> W <sub>I</sub> |
|      | 9    |          | A <sub>2</sub> - B <sub>1</sub> |    | ALU            |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 10   | WRITE A2 |                                 | Τ  |                |    |    |    |    |    |                     | T  |                |     |                |                |                |                |                               |
|      | 11   |          |                                 |    | 4              |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 12   |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 13   |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 14   |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 15   |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 16   |          |                                 |    |                |    |    | Γ  |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 17   |          |                                 |    |                |    |    |    |    |    |                     |    |                |     |                |                |                |                |                               |
|      | 18   |          |                                 |    |                |    |    | Γ  |    |    |                     |    |                |     |                |                |                |                |                               |

(C)

|                                                   |      |          |                                 |     | REA | L  |                |                |    | IMAGINARY           |                |                |    |                |                |                | MULTIPLIER     |                               |
|---------------------------------------------------|------|----------|---------------------------------|-----|-----|----|----------------|----------------|----|---------------------|----------------|----------------|----|----------------|----------------|----------------|----------------|-------------------------------|
|                                                   | STEP | DIO      | ALU                             | A,  | A2  | A3 | B <sub>1</sub> | B <sub>2</sub> | B3 | ALU                 | A <sub>1</sub> | A <sub>2</sub> | A3 | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | мю             | MULT                          |
|                                                   | 1    | READ B   |                                 |     |     |    | рI             |                |    |                     |                |                |    | ΡI             |                |                |                |                               |
|                                                   | 2    |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                | B <sub>R</sub> |                               |
|                                                   | 3    | READ A   |                                 | ΡI  |     |    |                |                |    |                     | ΡI             |                |    |                |                |                |                | B <sub>R</sub> W <sub>R</sub> |
| $\mathbf{A}' = \mathbf{A} + \mathbf{B}\mathbf{K}$ | 4    |          | $A_1 - MI$                      |     |     | MI |                | AĻU            |    |                     |                |                |    | Т              |                |                | B,             | B <sub>R</sub> W <sub>1</sub> |
| $\mathbf{B}' = \mathbf{A} - \mathbf{B}\mathbf{K}$ | 5    | READ B   |                                 |     |     |    | DI             |                |    | A <sub>1</sub> – MI | Π              |                | МI | ΡI             | ALU            |                |                | B <sub>I</sub> W <sub>R</sub> |
|                                                   | 6    |          | $A_{1} + A_{3}$                 |     | AĻU |    |                |                |    | $B_2 - MI$          |                |                |    |                | AĻU            | МІ             | B <sub>R</sub> | B, W,                         |
|                                                   | 7    | READ A   | $B_2 + MI$                      | ΡI  |     |    | МI             | ALU            |    | $A_1 + B_3$         | DI             | ALU            | П  |                |                |                |                | B <sub>R</sub> W <sub>R</sub> |
|                                                   | 8    | WRITE B2 | A <sub>1</sub> – MI             |     | П   | MI |                | AĻU            |    | $A_2 + A_3$         | Π              | ALU            |    | Π              |                |                | B              | B <sub>R</sub> W <sub>I</sub> |
|                                                   | 9    | READ B   | A <sub>2</sub> - B <sub>1</sub> |     | ALU | Π  | рI             |                |    | A <sub>1</sub> – MI | Π              |                | ΜI | ΡI             | ALU            |                |                | B <sub>I</sub> W <sub>R</sub> |
|                                                   | 10   | WRITE A2 | $A_{1} + A_{3}$                 |     | ALU |    |                |                |    | $B_2 - MI$          |                |                |    |                | AĻU            |                | B <sub>R</sub> | B, W,                         |
|                                                   | 11   | READ A   | $B_2 + MI$                      | DI. |     |    | MI             | ALU            |    | $A_1 + B_3$         | ΡI             | ALU            |    | ŀ              |                |                |                | B <sub>R</sub> W <sub>R</sub> |
|                                                   | 12   | WRITE B2 | A <sub>1</sub> – MI             |     |     | MI |                | ALU            |    | $A_2 + A_3$         |                | ALU            |    |                |                |                | B              | B <sub>R</sub> W <sub>I</sub> |
|                                                   | 13   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |
|                                                   | 14   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |
|                                                   | 15   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |
|                                                   | 16   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |
|                                                   | 17   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |
|                                                   | 18   |          |                                 |     |     |    |                |                |    |                     |                |                |    |                |                |                |                |                               |

Fig 7—The microcode needed to execute one butterfly in Fig 6's circuit is 10 cycles long (a). By following the code in (b) you can start computing a second butterfly concurrently. The code in (c) starts a third concurrent butterfly.

### Pipeline the FFT processor to reduce bus's 50% idle time

To help keep these values straight, the circuit uses the Am29520 multilevel pipeline register. Fig 8 depicts how this device operates in the address pathway to memory. Every four clock cycles, at the beginning of a new butterfly, the Am29540 FFT address sequencer generates a new set of addresses for A and B. The 29520s store these addresses temporarily in internal registers that are configured as a 4-deep pipeline. As each new address is clocked into the first pipeline register, previously stored addresses advance to the next register. You can select any register for output and access the appropriate address for any microcode cycle. Fig 9 illustrates the order in which this design stores and retrieves data addresses.

The 29540 also generates addresses for the PROM containing  $W^k$ . It creates a new address for each butterfly and then stores it in an external register. Because complex products are computed on successive cycles, the address to the PROM changes at the beginning of each new butterfly (that is, every four cycles).

Microcode lines 9 through 12 then execute as a loop until the 29540's FFT Complete signal goes active. The entire transform requires only 12 words of microcode: The first eight preload the pipeline, while the last four perform the computations.

### Word size can almost triple

In these FFT implementations, you must be concerned about word growth. Because the FFT butterfly produces outputs by adding terms, butterfly outputs may require more bits than each input has. Specifically, consider the equation  $A' = A + BW^k$ , where A', A, B and  $W^k$  are complex. This equation, one of the butterfly's two basic calculations, represents vector rotation and addition. The term  $BW^k$  merely describes a rotation of vector B by unit vector  $W^k$ ; the result is added to vector A. The magnitude of A' can therefore be twice as large as A or B.

Unfortunately, this problem is more insidious than it appears. Although complex magnitudes do no more than double at each stage, the real and imaginary components of these complex values can increase by more than that amount. Indeed, they can increase by  $1+\sqrt{2}$ , or 2.41, for decimation-in-time algorithms, which is the type used here. They can even increase by  $2\times\sqrt{2}$ , or 2.82, for decimation-in-frequency algorithms, which use a different butterfly technique (see **reference**). In either case, you must allow for as much as two bits of growth in every stage. You could design a system with sufficient extra growth bits, but this approach is wasteful and expensive, particularly if the transform has many stages.

An inexpensive alternative is to use the block-floating-point scheme. This technique uses a common block



Fig 8—The system must keep track of many sets of data when computing concurrent butterflies. To do so, it must incorporate an FFT address sequencer.

|       |                      |                 | ;               | -               |                       |                 |                  |
|-------|----------------------|-----------------|-----------------|-----------------|-----------------------|-----------------|------------------|
| CYCLE | MEMORY               | 29520<br>INPUT  | REG<br>A1       | REG<br>A2       | REG<br>B <sub>1</sub> | REG<br>B2       | 29520<br>OUTPUT  |
| 0     |                      | *B <sub>0</sub> |                 |                 |                       |                 |                  |
| 1     | READ B <sub>0</sub>  |                 | •Во             |                 |                       |                 | *В <sub>о</sub>  |
| 2     |                      | *A <sub>0</sub> | ۴B              |                 |                       |                 |                  |
| 3     | READ A               |                 | *A <sub>0</sub> | *В <sub>0</sub> |                       |                 | *A <sub>0</sub>  |
| 4     |                      | •в,             | *A0             | *B0             |                       |                 |                  |
| 5     | READ B <sub>1</sub>  |                 | •В,             | ۰A°             | *B0                   |                 | *B <sub>1</sub>  |
| 6     |                      | *A,             | *B,             | *A <sub>0</sub> | *B <sub>0</sub>       |                 |                  |
| 7     | READ A1              |                 | *A <sub>1</sub> | *B,             | *A <sub>0</sub>       | *B <sub>0</sub> | *A,              |
| 8     | WRITE B              | *B2             | *A,             | •В,             | *A0                   | *B0             | *В <sub>0</sub>  |
| 9     | READ B <sub>2</sub>  |                 | *B2             | *A,             | *B,                   | *A0             | *B2              |
| 10    | WRITE A <sub>0</sub> | *A2             | *B2             | *A1             | *B,                   | ۰A              | *A <sub>0</sub>  |
| 11    | READ A2              |                 | *A2             | *B2             | *A,                   | *B,             | * A <sub>2</sub> |
| 12    | WRITE B              | *B3             | *A2             | *B2             | *A <sub>1</sub>       | *B <sub>1</sub> | *B <sub>1</sub>  |

\*indicates "address of"

Fig 9—Four registers in the address sequencer prove sufficient to store the various data addresses needed to compute three concurrent butterflies.

### Efficient microcode executes three butterflies concurrently



Fig 10—To avoid data overflow caused by word growth, implement a block-floating-point technique with 4-bit shifters inserted into the data-read pathway.

exponent for all data. If the system expects or detects an overflow, it shifts data to the right and increments the block exponent.

The circuit shown in Fig 10 implements this approach with two Am25S10 4-bit shifters inserted between memory and the real and imaginary processors in the data-read pathway. The shifters allow you to divide data read from memory by 1, 2 or 4. Each time the system writes data to memory, external logic compares the two high-order data bits to the sign bit.

If the high-order bits differ from the sign bit, the data's magnitude has expanded into the high-order bits, and an overflow could occur in the next column of butterflies because data could increase by 2.41. Consequently, if logic detects an expansion into the highorder bits, it sets a flag. Then, when the next column begins (signaled by Iteration Complete from the 29540), the system reads all data as shifted to the right by zero bits (if no expansion took place), by one bit (if the expansion occurred only in  $D_{13}$ ) or by two bits (if the expansion occurred in  $D_{14}$ ). Note that the sign bit must be duplicated in the high-order bits. Upon receipt of Iteration Complete, the block-exponent counter increments by 0, 1 or 2. The host CPU can then read this value to determine the Fourier coefficients' absolute magnitude.

Though the 29540 FFT address sequencer has many operating modes to accommodate varying architectures and algorithms, the system described here executes a radix-2 decimation-in-time transform that doesn't produce data in a convenient sequence. You'd like data for the first frequency notch to occupy the lowest memory location, data for the second notch to occupy the next lowest location, and so on. To remedy this situation, you can either scramble data points before they enter the algorithm so that they emerge in the proper sequence, or you can scramble them afterwards.

Although this article's architecture describes a special-purpose FFT processor, you can use it as a generalpurpose signal processor. Many signal-processing algorithms have a sum-of-products notation that is well suited to this design. Essentially, you can substitute the PROM that contains  $W^k$  with a RAM that the host processor loads. In this way, you can easily implement windowing and scaling operations. **EDN** 

### Reference

Oppenheim, Alan V and Schafer, Ronald W, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1975.

### Authors' biographies

Robert Cohen worked as a design engineer in product planning at Advanced Micro Devices (Sunnyvale, CA) from 1981 to 1984. He is now a private consultant. He received a BSE degree in computer science and engineering from the University of Pennsylvania in 1981. His favorite food is a combination of guacamole and knishes.

Robert Perlman is a senior product planning engineer with the DSP/array processing group at Advanced Micro Devices. He holds a BSEE degree from the Rensselaer Polytechnic Institute and an MSEE degree from Johns Hopkins University, and he has done design work in air borne digital design processing for Westinghouse.





## Trim DSP overhead by changing your sampling rate

The sampling rate of a signal may be altered very easily as it passes through the various stages of a digital signal processing system. This can reduce the number of cycles required to perform operations in the digital domain.

> Kenn Lamb Advanced Micro Devices Sunnyvale, California

merging digital signal processing techniques require arithmetically intensive real-time processing. Each sample fed into your system must be operated on many times. Therefore, the performance required of the DSP processor is determined by both the type of processing to be applied to the signal and by its sample rate.

The sampling rate of a DSP system is usually determined by the analog-to-digital converter at the front end of the system. The choice of this sampling rate affects how well available arithmetic resources are used. Here's how to create a highly efficient system.

The techniques used are called "decimation" and "interpolation." Both are used to decrease and increase the sampling rate. respectively. When implemented using digital signal processing components (such as slices, programmable sequencers, and multipliers), these methods allow the construction of very efficient narrowband filters which can outperform direct implementations of the desired filter.

Reprinted with permission from INTEGRATED CIRCUITS MAGAZINE May 1985, with all rights reserved.

Fig. 1. A sampled signal's spectrum is repeated at periodic intervals, centered at integer multiples of the sampling frequency. The bandwidth of the images must be less than the sampling frequency to avoid aliasing.





The sampling rate of a DSP system must satisfy a number of criteria, beginning at the front end within the analog-to-digital (A/D) converter. In theory, it is only necessary to sample a signal at a rate greater than twice the signal's bandwidth.

In practice, this is not possible, because the band of interest may not extend from DC, and "brickwall" antialiasing filters are not available. Nevertheless, certain techniques allow us to approach the theoretical minimum.

You—as a designer—could for example, shift the signal's frequency content, so that the band of interest extends from DC to ensure that the maximum signal frequency is equal to the signal's bandwidth. Quadrature sampling, another technique, extends this approach by allowing two analog-to-digital converters to share the work.

Finally, second-order sampling permits the signal to be examined at twice the bandwidth, even if there are higher frequencies present. In the latter case, the sampling rate reduction is achieved at the cost of taking twice as many samples; it also presents an onerous filtering problem to the first stage of the digital signal processor. A sampling rate reduction greater than two must be achieved if this technique is to be of any benefit.

### **Avoid Last Resorts**

These sampling techniques are usually a last resort, used when the analog-to-digital conversion task would otherwise border on the impossible. There are only a few such applications in which the sampling rate will approach the theoretical minimum.

In other situations, there are advantages in oversampling the input signal: reduced specifications for antialiasing filters and

MAY 1985

improved resolution from A/D converters. These advantages usually dictate an initially high sampling rate, one that is invariably maintained throughout the rest of the system, and which therefore results in inefficient use of the available resources.

A block diagram of a DSP system is usually drawn as a cascade of processing stages, each performing different operations on the signal. This is conceptually the simplest way to specify and analyze the processing, but it is rare for the individual blocks to map directly into separate pieces of hardware.

All digital signal processing algorithms are based on the same set of arithmetic operations, typically addition, multiplication, and multiplication/accumulation. An arithmetic processor specifically tailored to DSP applications can perform all the operations specified within the separate sections of the system block diagram.

### The Fewer the Better

The objective of the system designer is to achieve all the processing required with the least number of processor units. Obviously, any reduction in the number of cycles needed to execute individual stages of the processing leads to overall savings. Typically, it may result in a reduction in processor units, an increase in the number of channels that may be accommodated, or higher processing quality in cases in which only a single DSP unit is used.

### Intelligent Rationing

These benefits result from maintaining an efficient ratio between a signal's sampling rate and its bandwidth, as the overall frequency content of the signal is modified by its passage through the digital signal processing system.

Additional processing savings accrue from integrating the sampling rate changes directly into the processing stages themselves. These savings are most apparent in operations such as lowpass or narrowband filtering.

### Modify Those Sample Rates

The sampling rate of a signal may be modified by either removing unnecessary samples or calculating and inserting additional samples. These techniques make up decimation and interpolation, respectively. In order to understand the effects on a signal of the interpolation and decimation processes, it is important to be familiar with the frequency domain representation of a sampled signal.

The Nyquist Criterion, so familiar to workers in the field of DSP, avoids aliasing distortion by specifying a minimum ratio between the sampling and maximum signal frequencies. In practice, this minimum ratio of two is often exceeded to alleviate the rolloff specification of the antialiasing filter.

In the frequency domain, the sampled signal's spectrum is repeated at periodic intervals, centered at integer multiples of fs (the sampling frequency). See Fig. 1. The bandwidth of these images must be less than fs or they will overlap, a condition termed aliasing.

Decimation and interpolation change fs and hence also alter the interval at which the images repeat. These images may, therefore, be moved selectively closer together or further apart.

### **Decimation Explained**

Given that a signal is oversampled (that is, the maximum signal frequency is less than half the sampling frequency), the sampling rate may be reduced by eliminating unnecessary samples. At first sight, this may seem to entail simply removing a number of the samples from the time record.

To avoid catastrophic distortion of the signal, however, the time increment between each sample must be the same. This implies that the minimum achievable decimation ratio is a factor of two, corresponding to the elimination of every other sample.

Such a technique has been used for multistage filters and is termed "decimation by octaves." While it is possible to decimate in this manner by an integer ratio, it is rare to have such grossly oversampled signals in a real application.

Lowpass filtering leads to oversampled signals, as it is the high frequencies that are attenuated. Therefore, decimation techniques are usually associated with lowpass filters. Assuming that a signal is oversampled by a factor of less than two, lowpass filtering must occur before any decimation to avoid aliasing distortion. A simplistic approach would be to filter the signal and then decimate by discarding unwanted filter outputs.

If filter outputs are to be discarded, then why bother to calculate them in the first place? Unfortunately, recursive filter structures require all outputs to be calculated, since these outputs are fed back into the filter to influence subsequent outputs. Transversal filters do not suffer from this restriction and, consequently, permit more efficient lowpass decimating filters.

Assume that a lowpass filter of N coefficients filters a signal resulting in an output oversampled by a factor of P. Without decimation, the filter would have

INTEGRATED CIRCUITS MAGAZINE

### COMPLEX HETERODYNING-HOW MUCH PROCESSING POWER?

Why complex heterodyning? Just what is it? For an answer, consider that digital systems take full advantage of quadrature frequency shifting techniques. Unlike analog systems, there is no possibility of frequency or phase drift.

As such, let's look at some real continuous signals, such as those diagrammed in the accompanying series of figures. Take a look at the "snapshot" of Figure Aa. Multiplying this continuous signal by the cosine of frequency fc (illustrated in Figure Ab), yields a spectrum (Figure Ad).

This spectrum displays significant aliasing between thetwo frequencies —fc and fc. However, multiplying the original signal by the sinusoid (Figure Ac) yields another spectrum (Figure Ae).

Then, combining the information (that shown in Figure Ad and Ae) allows the reconstruction of the original signal. It is shifted in frequency by fc. This technique is known as complex heterodyning.

### **Repeat Performance**

In the sampled world, the original signal repeats at intervals of fs, the sampling frequency, as illustratea in Figure Af. The sampled equivalents of the SIN and COS have the form shown in Figures Ah and Ag, respectively.

Multiplication of the original signal by these two sinusoids yields the spectra (Figures Aj and Ai).

Again, the original signal information, with a shift in

frequency of —fc, may be extracted (as indicated by Figure Ak).

From a practical point of view, the original real signal of sampling rate fs has been converted into a complex signal with the same sampling rate. This implies that twice as much processing will now be required to accommodate the real and imaginary components of the complex signal.

Closer examination reveals that the spectra (of Ai and Aj) contain duplicated information that may be removed with lowpass filters and decimation of both components of the complex signal (by a factor of two). The composite sampling rate of the complex components is the same as that of the original signal, while still extracting the spectrum of Figure Al.

### Decimation

The process of decimation may be thought of as oc-







curring in two stages: a lowpass filtering stage, followed by sampling rate reduction. The original signal (Figure Ba) of maximum frequency fo is sampled at a frequency, fs.

The signal is then passed through a lowpass filter with a cutoff frequency of fc. The resulting signal (shown in Figure Bb), may be decimated by a factor P to yield the spectrum of Figure Bc. This latter spectrum has a sampling rate of fs/P where P=2.

### Interpolation

Next comes interpolation. Interpolation is the opposite of decimation and is again achieved in two stages. The first stage involves padding the sampled signal with (Q-1) zero-valued samples between each of the original samples.

This operation changes the spectrum of the original signal (that of Figure Ca) to that of Figure Cb, for Q=2. The spectrum (Figure Cb) has much the same form as that in Figure Ca, the major difference being labelling of the frequency axis.

The effective sampling rate has been increased to Q\*fs, as expected; however, the signal's spectrum now contains (Q-1) additional images. A lowpass filter, with cutoff frequency fo, will extract the desired portion of the spectrum (from Figure 8b) to yield the interpolated signal of Figure 8c. This lowpass filter must exhibit a gain of Q to compensate for the energy lost in filtering out the (Q-1) signal images.



### INTEGRATED CIRCUITS MAGAZINE

to perform N operations per input point. With decimation, this is reduced to N/P. The result is only one output for every P input.

The throughput of this filter, as well as the rest of the processing downstream, is therefore increased by a factor of P. The filter does its job in fewer cycles, thereby reducing the processing burden on all subsequent operations—a true "win-win" scenario!

### Interpolation, Too

Interpolation is the opposite of decimation. This operation involves increasing the sampling rate to create an oversampled signal. It cannot be used to recover signals distorted by aliasing, because this type of distortion is irreversible in the majority of situations.

Typical applications for interpolation include the reduction of the output reconstruction filter specification in audio systems or the smoothing of gaps between discrete line spectra in the output from a fast Fourier transform (FFT).

The first stage in interpolating a signal involves padding the signal with extra zero-valued samples. Since the previously stated restrictions about keeping the sampling interval constant apply, interpolation must be done by integer ratios. This is not a limitation, because interpolation is usually implemented to significantly increase the number of output samples.

The original signal, sampled at fs, repeats at intervals of fs as dictated by sampling theory (Figure 2a). Padding the signal with zeros increases the sampling rate, but also changes the form of the signal.

In the case of three zeros placed between each of the original samples (interpolating by a factor of four), the effective repetition frequency becomes four times fs. However, the signal's bandwidth increases in proportion, so that three of the signal's images are now included as part



Fig. 3. Lowpass filtering in the decimation process needs a transition bandwidth of 15 percent of the interpolated signal's bandwidth (a).
The original signal is lowpass filtered before interpolation (b). The new signal is interpolated to yield another signal (c); further decimation gives the output (d).

of its spectrum (Figure 2b).

The frequency domain representation of the signal has not changed in shape; the padding has the effect of relabelling the frequency axis. To obtain the desired interpolation operation requires—surprisingly—the use of a lowpass filter. This filter removes the three unwanted images, yielding the original signal, oversampled by the desired factor of four.

At first sight this might appear to be a difficult lowpass filtering task, requiring operation at the increased sampling rate. But this need not be the case if the transversal filter structure is used.

Clearly, three out of four inputs to the filter are zero and, consequently, will not contribute to the filter output. These samples may, therefore, be skipped over when performing the filter operation saving valuable processor cycles. This technique allows the construction of efficient interpolating lowpass filters.

Let's assume that an N-point filter is used to interpolate a signal by a factor of Q. The simplistic approach would require N\*Q operations for each one of the original samples. The interpolating lowpass filter only requires N operations per original sample, implying that the interpolating filter operates at the rate of the input data, regardless of the interpolation ratio applied.

### Cascading Processes

The ability to decimate by noninteger ratios—in particular, ratios of between one and two—is essential to make effective savings in system applications. Noninteger ratios may be achieved by cascading interpolation and decimation processes.

To interpolate a signal by a factor Q, and then to decimate it by a factor P, changes the effec-

tive sampling rate to Q/P. While Q and P themselves must be integers, the ratio of one to the other can yield non-integer decimation ratios. The interpolation operation must, of course, precede the decimation to avoid aliasing distortion.

### Two for One?

Again, both interpolation and decimation are effectively performed by a lowpass filter. What about the possibility of concatenating both operations into the same filter?

Assume that the interpolation ratio is Q. Then, for an N-point filter, N operations are required for every input point, despite the fact that Q\*N output points are produced. The decimation implies that only one output is generated for every P inputs, so that only N/P operations per input point fulfill the requirements of both operations.

It may be tempting to assume that a large value for P, the effect of which could be offset by a correspondingly large value for Q, would significantly improve the efficiency of the decimation process by reducing the effective value of N/P. However, this will not work because N is related to Q and increases proportionally with any increase in Q.

The increase in N occurs because the lowpass filtering for both the interpolation and decimation is required to select a proportionally smaller percentage of the padded signal's bandwidth. The number of points in a filter is related to the percentage of the signal's total bandwidth that is taken up by the transition band of the filter.

As Q increases, the transition bandwidth remains constant, but the bandwidth of the padded signal increases; hence, N increases proportionally with Q. This

INTEGRATED CIRCUITS MAGAZINE



means that the processing overhead involved in changing the sampling rate of signal is directly related to how closely the sampling rate of the decimated signal approaches the Nyquist rate. This is the design parameter that specifies the maximum width of the transition band.

### Less Than You Imagine

A non-integer decimating process actually requires significantly less processing than a cursory examination would indicate. For example, if one of the processing stages in a system reduces a signal's bandwidth to 0.67 of its former value, then a decimation ratio of 1.5 may be applied without aliasing. This ratio may be achieved with values for Q and P of 2 and 3, respectively.

Assume a practical sampling rate Fs of Fs = 2.6 \* Fo (where Fo is the maximum signal frequency, Figure 3a). The lowpass filter for the decimation process would need a transition bandwidth of fifteen percent of the interpolated signal's bandwidth. This gives N a value of 54. (The lowpass filtered signal is illustrated in Figure 3b.)

This new signal is interpolated to yield the signal shown in Figure 3c, which is then decimated back down to give the output of Figure 3d.

The value of P reduces the required number of operations per input sample from 54 to just 18. The output rate of the decimation stage is 0.67 of the input rate, resulting in a saving of 33 percent in the number of cycles required to execute all subsequent operations.

If the number of subsequent cycles exceed 54, then an overall saving will be achieved. For example, given 300 cycles of downstream processing, the total system savings would be: [(1-1/1.5) \* 300] - 18 = 82 cycles

per input sample, which corresponds to 27 percent of all subsequent processing. This saving enables you to reduce system size and cost or process more channels.

### **Lowpass Filter It**

If a lowpass filter is used to reduce the signal's bandwidth in the previous example, then as this function is duplicated by the decimation process, the original filter becomes superfluous. It may be discarded, saving 27 cycles per original input sample.

It should be clear that the decimating lowpass filter performs the same filtering operation as the original filter but in 18 instead of 27 cycles, a saving of thirty percent in itself. This saving is in addition to those in subsequent downstream processing that occur as a result of the reduced output rate. For these reasons, lowpass filters are usually implemented using decimating techniques.

### A Look at Hardware

A decimating/interpolating stage, such as the one required in the system example, is constructed from an enhanced finite impulse response (FIR) filter. The filter structure performs a discrete convolution, according to the following formula:

$$Yk = \sum_{n=1}^{N} Cn * X (k-n+1)$$

where Xk and Yk are the filter inputs and outputs, respectively, and Cn are the coefficients; or, for the linear phase case: The arithmetic section of the FIR filter is unchanged from the non-decimating version, since the decimation is achieved by adapting the control and addressing sequence.

The control flexibility of the components used to construct the filter determine how much of the expected savings from decimation are realized in practice. The microprogrammable environment is ideally suited to this type of application.

Let's look at a hardware example (Figure 4). The arithmetic section performs the "number crunching" and may be tailored to suit processing requirements by varying the number of multipliers, multiplier/accumulators, and ALUs used.

If desired, the arithmetic section of a simple filter may be constructed with one multiplier/accumulator (Figure 4a). Alternatively, multipliers and multiport pipelined processors(Fig. 4b) may be used for arithmetic to improve the cycle time and provide the advantage of microprogrammable control lines and overflow protection. An efficient linear phase construction may be achieved with devices like the AMD Type Am29501 multiport pipelined processor and Am-29510 16 X 16-bit parallel multiplier/accumulator combination, for example (Figure 4c), effectively doubling the throughput.

The control section sequences the operations and selects the correct data points and coefficient values for the arithmetic section. It does so by defining the appropriate addresses within the data RAM and the coefficient PROM.

For the direct or linear phase implementation of a filter, count-

$$Yk = \sum_{n=1}^{N/2} Cn * [X (k=n+1) + X (k-N+n)]$$

MAY 1985

ers enhanced by hardware, such as multilevel pipeline registers, can generate addresses. For decimating or interpolating filters, the addressing sequence becomes more complex and requires additional modulo counters; this task also requires the overall control flexibility offered by a device such as the AMD Type Am29PL141 fuse-programmable controller.

Control circuitry eliminates redundant cycles that would result from zero inputs or unwanted outputs. The method is different for the decimating and interpolating stages of the filter.

In interpolation, a normal filter sequences through all of its coefficients, multiplying each by a corresponding data sample. For a padded signal, many of these data points are zero and, consequently, do not contribute to the output.

Zero padding may be achieved by incrementing the data addresses by one, and the coefficient addresses by an interval equal to the interpolation ratio. This padding technique has the beneficial side effect of automatically avoiding all redundant operations.

However, it is necessary to keep track of the location of the first non-zero data point, since this defines the first coefficient address to be used by the filter. Any subsequent decimation operations will change this address.

To avoid redundant cycles in decimation, filter outputs that would be subsequently discarded are simply not calculated. A filter without an interpolating stage accomplishes this by writing more than one point at a time into the filter's cyclic buffer. For example, writing in two new samples for each calculation of an output gives a decimation ratio of two.



For a filter that incorporates an interpolation stage, the decimation could require the writing of zeros or samples (or both) into the cyclic buffer. Since zeros are not explicitly written into the STORE, the coefficient START address is incremented to compensate.

The coefficient START address is RESET to zero every time a true sample is written into the STORE. Depending upon the applicable decimation and interpolation ratios, a varying number of true samples and apparent zeros will be written into the cyclic buffer between each output from the filter. The relevant coefficient START address may be calculated by incrementing a Fig. 5. This decimator-based narrowband filter hardware frees the multiplier to perform heterodyning without overhead. No filtering occurs while samples are written into the cyclic buffer.

INTEGRATED CIRCUITS MAGAZINE

modulo counter by the decimation ratio. The count modulus is equal to the interpolation ratio.

Very efficient narrowband filters may be constructed, using the same techniques that resulted in efficient lowpass filters; this is because with narrowband filters, significant decreases in the signal's bandwidth occurs, allowing a high decimation ratio.

A narrowband filter can be constructed from a lowpass filter with half the bandwidth of the equivalent narrowband filter. The input signal must first be shifted to baseband, with the signal frequency lying in the center of the bandpass filter shifted to be at DC, so that it fits within the passband of the lowpass filter.

The shift operation can be done with a complex heterodyne stage that simply multiplies the input signal by a complex frequency (SIN and COS) equal to the center frequency of the bandpass filter. This complex, frequency-shifted signal is then filtered by a decimating lowpass filter, interpolated, and shifted back up to the original band with the same complex heterodyne technique. The signal may, of course, be shifted back up to any band desired.

A given narrowband filter may have many different center frequencies, each determined by the complex frequency employed in the heterodyne stage. The same filter may be used to divide a wide bandwidth signal up into a number of smaller blocks by stepping the heterodyne frequency in increments equal to the filter's bandwidth. Since the bandwidth of the narrowband filter depends on the bandwidth of the lowpass filter, it is independent of the effective center frequency. A number of lowpass filters may be stored within the arithmetic processor, allowing the narrowband filter's center frequency and bandwidth to be changed at will.

### The Complex Heterodyne

The complex heterodyne operation requires two cycles per input sample; these are used to multiply the sample by digitized values of SIN and COS, to yield the imaginary and real components of the basebanded complex signal.

The values of SIN and COS derive from the sampling of a complex sinusoid of the required frequency at a rate equal to the effective sampling rate of the signal. This lowpass filtering leads to an advantageous reduction in the bandwidth of the signal, which may be exploited by decimating the signal accordingly. There is an additional inherent decimation factor of two introduced when the signal is translated into a complex representation.

Shifting to restore the signal back to its original band is again achieved through complex heterodyning. This operation must, of course, be performed after the signal has been interpolated back to the original sampling rate, to avoid aliasing distortion.

For example, assume that a narrowband filter is required to select ten percent of the band of an input signal with transition bands occupying an additional ten percent. A direct implementation of this filter would operate at the input sampling rate and require about 160 cycles per input sample to process.

The decimating equivalent would require two cycles per input point, operating at the input sampling rate, to perform the complex heterodyne. The baseband signal occupies only 15 percent of the original signal's bandwidth, and therefore may be decimated by a factor of 6.7 resulting from values of P and Q of 20 and 3, respectively. After the initial interpolation, the transition band occupies only 3.3 percent of the interpolated signal's bandwidth, suggesting that the lowpass filter will require 242 taps.

The effective processing rate of this filter is N/P cycles per input point, resulting in an overall requirement of twelve cycles for each of the original input points. Two filters are required for the real and imaginary channels, resulting in a total processing requirement of 26 cycles per input, a saving of about eighty percent.

To shift the filtered signal back up to its original band and sampling rate at this point would not achieve any overall savings, because there will be no net decrease in the sampling rate. If further processing of the filtered signal is required, then all stages that now operate at the reduced sampling rate will benefit from the eighty percent saving in processing.

Narrowband filter hardware (Figure 5) is also based on the decimator. The complex heterodyne operations are best implemented by a multiplier using devices such as the AMD Type Am29517 multiplier and Am-29501 multiport pipelined processor as the arithmetic section.

No filtering will occur while samples are being written into the cyclic buffer, so that the multiplier is free to perform the heterodyne operation without any overhead. The necessary complex frequency coefficients may be obtained by incrementing the addresses of a pair of Am29526 sine and Am 29527 cosine generators. The ICs share the same address space and bus as the registered PROMs that contain the filter coefficients. by Bernard J New Manager Product Planning and Applications AMD 901 Thompson Place Sunnyvale, CA. 94088

### Introduction

The essential generality of general purpose computing usually prevents optimization of the processor. In digital signal processing, however, this is not usually the case. Even in multi-purpose signal processors, the algorithms to be implemented will have many similarities. In particular, they will be repetitive, intensive in both arithmetic and memory operations, and branch infrequently.

The repetitive, non-branching characteristic is exploited through extensive use of pipelining, more than would be considered advantageous in a general purpose machine. The large and predictable number of arithmetic operations and memory accesses permits an arithmetic processor to be contructed with resources balanced to match the problem at hand.

The construction of this processor will not normally allow it to be multiplexed effectively with the task of address generation. As the addresses are often the only variation in successive iterations of the program and follow a well-defined pattern, it is desirable that they are generated in an independent processor which operates concurrently with the arithmetic processor.

The Am29500 family of digital signal processing and array processing products provides devices for use in both areas. In particular, for use in the arithmetic processor, the family includes the Am29501 Multi-port Pipelined Processor and the Am29516/7 16-bit Parallel Multipliers. These constitute two major resources. The parallel multiplier is essential to signal processing calculations, and the Multi-port Parallel Processor complements this with register and ALU facilities. This device's three ports provide the communication necessary for efficient use of multiple elements. The third major resource to be managed in the arithmetic processor is the memory. Using the fast Fourier transform (FFT) as an example, this paper describes an approach by which an algorithm may be analyzed and an appropriate allocation of resources made. It should be stressed, however, that the Am29500 family is of general application, and is not limited to the FFT.

### The Balancing Act

The objective in optimizing an arithmetic processor is to provide facilities in proportion to their usage in the algorithm being implemented. In this way, the processor is balanced with no one resource lying idle while another completes its task, and all are used at peak efficiency. Increasing just one resource will not necessarily increase performance due to the creation of an imbalance.

Consider the fast Fourier transform. This comprises the repeated evaluation of what is commonly known as the FFT "Butterfly." This is shown diagramatically, together with its formulae, in Figure 1. There are two input data points, A and B, which are complex numbers. These are combined together, and with a complex coefficient, W, to form two outputs, A' and B'.

Inspection of the formulae shows that a single implementation would require one complex addition, one complex subtraction and one complex multiplication. Also five complex memory operations are required; three reads for A, B and W, and two writes for A' and B'. In terms of real operations, this reduces to four multiplications, six addition/subtractions and ten memory transactions. It is to these requirements that an FFT processor must be matched.



FIGURE 1: The FFT Butterfly

### Moving the Bottleneck

From the above analysis it may be seen that if all operations take equal time, the throughput is limited by the memory requirement, which makes it impossible to perform butterflies more frequently than every ten cycles. However, this may be reduced by exploiting two factors, which are common to many signal processing algorithms.

Firstly, the memory may be re-organized to reflect the complex nature of the data. Making the memory twice as wide but only half as deep does not increase the size of the memory, but allows for simultaneous access to real and imaginary parts, effectively doubling the memory bandwidth. Secondly, it may be noted that the multiplications are performed between data and a coefficient. If the coefficients are stored in a separate memory, they may be accessed concurrently with the data.

These changes reduce the number of data memory accesses to four. However, the butterfly will still require six cycles, as the throughput is now dominated by the ALU requirement. The bottleneck has moved from the memory to the ALU.

| For Each  | Memory Accesses | Add/Subtract | Multiply |
|-----------|-----------------|--------------|----------|
| Butterfly | 8               | 6            | 4        |

| Resource | Memo | ry Buses |   | ALU   | Multipliers |       |  |  |
|----------|------|----------|---|-------|-------------|-------|--|--|
| # Cycles | #    | Usage    | # | Usage | #           | Usage |  |  |
| 8        | 1    | 8/8      | 1 | 6/8   | 1           | 4/8   |  |  |
| 6        | 2    | 4/6      | 1 | 6/6   | 1           | 4/6   |  |  |
| 4        | 2    | 4/4      | 2 | 3/4   | 1           | 4/4   |  |  |
| 3        | 4    | 2/3      | 2 | 3/3   | 2           | 2/3   |  |  |
| 2        | 4    | 2/2      | 4 | 1.5/2 | 2           | 2/2   |  |  |

TABLE 1: FFT Resource Comparison

### Measuring Efficiency

A useful measure of efficiency is the proportion of time each resource is active. In the six cycle butterfly described above, the ALU will be used 100% of the time. The memory and multiplier will each be used on only four of the six cycles, 67% of the time. This is summarised in the first line of Table 1. This base system contains one memory, one ALU and one multiplier.

As the multiplier is an expensive resource, it is desirable to utilize it more efficiently. To do this necessitates adding more capacity to the current bottleneck, the ALU. The second ALU reduces the necessary ALU cycles to three, and moves the bottleneck to both the memory and the multiplier, each requiring four cycles. In this case the memory and multiplier efficiency is 100%, and the ALU efficiency is 75%. As shown in the table, further improvement requires that both the memory and the multiplier be duplicated. This gives a situation similar to the base system, but with twice the hardware resulting in twice the throughput.

### A Reasonable Solution

In theory this procedure could be repeated until four memories, four multipliers and six ALUs allowed a butterfly to be completed every cycle. However it is unlikely that this solution would be practical.

The problem encountered is in partitioning the memory such that two reads and two writes can be performed simultaneously. This partitioning must be consistent with the data flow of the



FIGURE 2: FFT Processor Architecture

FFT. Even the base system is not without problems. This requires that the result of a complex memory read be loaded into a real ALU. Also the inclusion of additional hardware to achieve more speed than is necessary for the application is obviously undesirable.

When comparing architectures which trade a doubling of throughput for a doubling of hardware, as occurs in the table, system integrity may be a deciding factor. While it may be convenient to build a single large, fast machine, if latency requirements allow, it may be better to alternate two slower machines. This allows for reduced operation, rather than failure, if one of the machines should fail.

The table obtained from the above analysis should be viewed only as an initial survey of the options available. Even after selecting a likely candidate is is necessary to show that the algorithm can be programmed into the processor in the number of cycles anticipated.

### Overlapped Programming

Let us assume that the single memory, two ALU, single multiplier architecture has been selected. This is shown in Figure 2. The objective is to implement the FFT butterfly in this architecture in four cycles. A simple inspection of the algorithm shows that in order to generate and store the real part of A', it necessary to read A and B, complete the real part of the multiplication, perform an addition and write the result to memory. Allowing for maximum concurrency this requires five cycles, and it obvious that the full program would be longer than that.

However, this does not imply that the goal of four cycle throughput is impossible, only that the latency will be greater than four cycles. The resources in the processor are such that if the program takes more than four cycles they will be idle for part of the time. While causality does not allow these resources to be applied to the current iteration of the program, the program repeats, and the resources may be applied to previous or following iterations.

This leads to a situation where an iteration of the program commences before the previous ones are complete, and the iterations overlap. This is similar to pipelining except that whole programs are involved and the hardware is multiplexed between overlapped programs.

The completed program for the FFT butterfly is shown in Table 2a. Ten cycles are required to complete this program. Table 2b shows the instruction stream when this program is

|      | Real     |         |    |     |     |                | Imaginary |    |         |    |     |                |                | Multiplier     |                |     |                               |
|------|----------|---------|----|-----|-----|----------------|-----------|----|---------|----|-----|----------------|----------------|----------------|----------------|-----|-------------------------------|
| Step | DIO      | ALU     | A1 | A2  | A3  | B <sub>1</sub> | B2        | B3 | ALU     | A, | A2  | A <sub>3</sub> | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | MIO | MULT                          |
| 1    |          |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 2    | ł        |         |    |     | 1   |                |           |    |         |    |     |                |                |                |                |     |                               |
| 3    | 1        |         |    |     | 1   |                |           |    |         |    |     |                |                |                |                |     |                               |
| 4    | I        |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 5    | Read B   |         |    |     |     | DI             |           |    |         |    |     |                | DI             |                |                |     |                               |
| 6    | ļ        |         |    |     |     |                |           |    |         |    |     |                | н              |                |                | BR  |                               |
| 7    | Read A   |         | DI |     |     |                |           |    |         | DI |     |                | н              |                |                |     | BRWR                          |
| 8    |          | A1-MSP  | н  |     | MSP | 1              | ALU       |    |         | н  |     |                |                |                |                | Bl  | B <sub>R</sub> W <sub>I</sub> |
| 9    | İ        |         | H  |     | н   | 1              | н         |    | A1-MSP  | н  | 1   | MSP            |                | ALU            |                |     | BIWR                          |
| 10   | 1        | A1 + A3 |    | ALU | 1   |                | н         |    | B2-MSP  | н  |     | н              |                | ALU            | MSP            |     | B <sub>I</sub> W <sub>I</sub> |
| 11   | 1        | B2+MSP  |    | н   | 1   | MSP            | ALU       | 1  | A1 + B3 |    | ALU | н              |                | н              |                |     |                               |
| 12   | Write B2 |         |    | Н   |     | н              |           |    | A2 + A3 |    | ALU |                |                |                |                |     |                               |
| 13   |          | A2 - B1 |    | ALU | 1   |                |           |    |         |    | н   |                |                |                |                |     |                               |
| 14   | Write A2 |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 15   | 1        |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 16   | 1        |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 17   | 1        |         |    |     |     |                |           |    |         |    |     |                |                |                |                |     |                               |
| 18   | 1        |         |    |     | 1   | I              |           |    |         |    |     |                |                |                |                |     |                               |

TABLE 2a: The Butterfly Program

|      |          |          |    | Re             | eal |                |                |                |          |    | Ima            | ginar          | y  |     |     | Mu  | Itiplier                      |
|------|----------|----------|----|----------------|-----|----------------|----------------|----------------|----------|----|----------------|----------------|----|-----|-----|-----|-------------------------------|
| Step | DIO      | ALU      | A1 | A <sub>2</sub> | A3  | B <sub>1</sub> | B <sub>2</sub> | B <sub>3</sub> | ALU      | A, | A <sub>2</sub> | A <sub>3</sub> | B1 | B2  | B3  | MIO | MULT                          |
| 1    | Read B   | A2 - B1  | н  | ALU            | н   | DI             | н              |                | A1-MSP   | н  | H              | MSP            | DI | ALU |     |     | B <sub>1</sub> W <sub>R</sub> |
| 2    | Write A2 | A1 + A3  |    | ALU            |     |                | н              |                | B2 – MSP | н  | 1              | н              | н  | ALU | MSP | BR  | BIWI                          |
| 3    | Read A   | B2 + MSP | DI | н              |     | MSP            | ALU            |                | A1 + B3  | DI | ALU            | н              | н  | н   |     |     | BRWR                          |
| 4    | Write B2 | A1 - MSP | н  | н              | MSP | н              | ALU            |                | A2 + A3  | н  | ALU            |                |    |     |     | Bi  | BRWI                          |
| 5    | Read B   | A2 - B1  | н  | ALU            | н   | DI             | н              |                | A1 - MSP | н  | н              | MSP            | DI | ALU | 1   |     | BIWR                          |
| 6    | Write A2 | A1 + A3  |    | ALU            |     |                | н              |                | B2 – MSP | н  |                | н              | н  | ALU | MSP | BR  | B <sub>1</sub> W <sub>1</sub> |
| 7    | Read A   | B2 + MSP | DI | н              | I   | MSP            | ALU            |                | A1 + B3  | DI | ALU            | H              | н  | н   |     |     | BRWR                          |
| 8    | Write B2 | A1 - MSP | н  | н              | MSP | н              | ALU            |                | A2 + A3  | н  | ALU            |                |    | 1   |     | BI  | BRWI                          |
| 9    | Read B   | A2 - B1  | н  | ALU            | H   | DI             | H              |                | A1-MSP   | н  | н              | MSP            | DI | ALU | 1   |     | BIWR                          |
| 10   | Write A2 | A1 + A3  |    | ALU            | i   |                | н              |                | B2 - MSP | н  |                | H              | н  | ALU | MSP | BR  | BIW1                          |
| 11   | Read A   | B2+MSP   | DI | H              | 1   | MSP            | ALU            |                | A1 + B3  | DI | ALU            | H              | н  | н   |     |     | BRWR                          |
| 12   | Write B2 | A1 - MSP | н  | н              | MSP | н              | ALU            |                | A2 + A3  | н  | ALU            |                |    |     |     | Bl  | BRWI                          |
| 13   | Read B   | A2 - B1  | н  | ALU            | H   |                | н              |                | A1 - MSP | н  | н              | MSP            | DI | ALU | 1   |     | BIWR                          |
| 14   | Write A2 | A1 + A3  |    | ALU            | I   |                | н              |                | B2 – MSP | н  | 1              | H              | н  | ALU | MSP | BR  | B <sub>1</sub> W <sub>1</sub> |
| 15   | Read A   | B2 + MSP | DI | н              | l   | MSP            | ALU            |                | A1 + B3  | DI | ALU            | H              | н  | H   |     |     | BRWR                          |
| 16   | Write B2 | A1 - MSP | н  | H              | MSP | н              | ALU            |                | A2 + A3  | н  | ALU            |                |    |     |     | BI  | B <sub>R</sub> ₩ <sub>I</sub> |
| 17   | Read B   | A2 - B1  | н  | ALU            | H   |                | н              |                | A1 - MSP | н  | н              | MSP            | DI | ALU | 1   |     | BIWR                          |
| 18   | Write A2 | A1 + A3  |    | ALU            | i   |                | н              |                | B2 - MSP | н  | 1              | H              | н  | ALU | MSP |     | B <sub>1</sub> W <sub>1</sub> |

TABLE 2b: The FFT Instruction Stream

restarted every four cycles. Through careful programming this can occur without the programs interfering. It may be seen from this table that the three major resources, memory, ALU and multiplier are used 100% of the time. This differs from the original estimate of 75% utilisation of the ALU, the extra 25% being accounted for by "inefficient" use in the program. However, the extra use of the resource is free, and not using it would at least have increased the latency.

Inspection of Table 2b also shows the instruction stream to be periodic, repeating every four cycles. These four cycles contain the information for the complete ten cycle butterfly program, and constitute the inner loop which must be executed to implement the FFT. At the beginning and end of the FFT, previous and following iterations are absent and the instruction stream becomes aperiodic. This must be accommodated by segments of linear code derived from the instruction stream, which fill and empty the overlap pipeline. In this case the program could be written to give the desired throughput. If this had not been the case, it would have been necessary to move to a "faster" architecture. The only outstanding task is to generate the addresses to provide the data stream into the arithmetic processor. For the specific case of the FFT this may be effected using the Am29540 FFT Address Sequencer. In other cases an address computer may be designed and programmed using the techniques described above.

### Conclusion

Using the FFT as an example, a method has been described whereby a digital signal processor may be optimized through resource management. This technique is applicable to architectures using building block components, such as the Am29500 Family. Indeed it was about this family that it was developed. Processors designed in this way exhibit maximum usage of components included.

### A New Approach to Floating Point DSP

Robert Perlman senior engineer, Product Planning Advanced Micro Devices Sunnyvale, CA 94088

#### ABSTRACT

A new high-speed, single-chip floating point processor, the Am29325, is introduced; this processor incorporates features of interest to those implementing high-performance digital signal processing systems. Processor architecture is described, and the advantages of the architecture for DSP and array processing applications are discussed. Typical small- and large-system designs are presented.

### INTRODUCTION

Floating point arithmetic engines are natural candidates for very-large-scale integration, due to the popularity of the function, and to the large amounts of design time and circuit board space needed to implement such a function in SSI and MSI. Early efforts to integrate floating point operators in a single chip or chip set usually resulted in serial-parallel designs which, while considerably faster than software floating point implementations, did not approach the speeds of fully parallel, dedicated hardware designs.

Recent improvements in process technology have made possible, for the first time, the joining of combinatorial floating point addition/subtraction and multiplication functions in a single VLSI device. The Advanced Micro Devices Am29325 Floating Point Processor contains all hardware necessary to perform high-speed, 32-bit floating point addition, subtraction, multiplication, and format conversion operations, in either IEEE or DEC floating point formats. The device also contains a flexible 32-bit data path, with facilities for local operand storage.

The integration of three elements - a combinatorial adder/subtractor, combinatorial multiplier, and data path - marks the fundamental difference between the Am29325 and previous floating point implementations. By combining these functions, the design addresses not only the problem of implementing fast floating point operators, but also the equally important problem of efficiently transferring operands from one operation to the next. The data path architecture is optimized for performing often-used arithmetic sequences, such as multiplication- accumulation and Newton-Raphson division.

The Am29325 is fabricated with the IMOX-S (for IoniMplantation, OXide isolation with Scaling) process, a refinement of earlier AMD bipolar processes. IMOX-S has a feature size of 1.5 microns; three layers of metal are used for interconnects. The Am29325 die contains 48,000 devices on 129,000 square mils of silicon, and is packaged in a 144-lead pin-grid-array. Standard cell techniques were used to reduce design time and simplify chip layout. Improvements in turnaround time were significant: custom design of the Am29116, a 16-bit bipolar microprocessor, took 51 months, while design of the Am29325, a device three times as large, took only 31 months.

The floating point processor is the first in a series of general-purpose, microprogrammable devices primarily intended for 32-bit systems. Other family members include the Am29331 microprogram sequencer, the Am29323 ALU, the Am29223 32-by-32bit fixed-point multiplier, and the Am29334 register file.

#### Am29325 ARCHITECTURE

The Am29325 comprises a high speed floating point arithmetic unit, a status flag generator, and a 32-bit data path (fig. 1).



Fig. 1: Am29325 Floating Point Processor block diagram.

Arithmetic unit - The three-port, combinatorial arithmetic unit contains a high speed adder/subtractor, a 24-by-24-bit multiplier, an exponent processor, and other logic needed to implement floating point operations. Two input ports, R and S, provide operands for the instruction to be performed. The result of an operation appears on output port F.

The arithmetic unit executes one of eight instructions (table 1). Three of the instructions - R PLUS S, R MINUS S, and R TIMES S - add, subtract, and multiply 32-bit floating point numbers. A fourth instruction, 2 MINUS S, subtracts 32-bit floating point operand S from 2. The 2 MINUS S instruction is used to perform Newton-Raphson division, a means of calculating the quotient A/B. Unlike conventional division, in which quotients are calculated with a series of subtractions and shifts, the Newton-Raphson division algorithm first calculates the reciprocal (1/B) using an iterative equation, then computes the quotient by post-multiplying the reciprocal by A.

The remaining four instructions perform data format conversions. Instructions INT-TO-FP and FP-TO-INT convert between floating point and 32-bit, 2's complement integer formats. The IEEE-TO-DEC and DEC-TO-IEEE instructions convert between IEEE and DEC floating point formats.

Instructions may be performed in either of two singleprecision floating point formats - the IEEE format, as specified in proposed standard P754, draft 10.0 (ref. 1), or the DEC F format (ref. 2). These formats are similar, each having an 8bit biased exponent, a 24-bit significand comprising a 23-bit

| MNEMONIC    | OPERATION                                                                     |
|-------------|-------------------------------------------------------------------------------|
| R PLUS S    | Add floating point operands R and S                                           |
| R MINUS S   | Subtract floating point operand S from<br>floating point operand R            |
| R TIMES S   | Multiply floating point operands R and S                                      |
| 2 MINUS S   | Constant floating point subtraction for<br>Newton-Raphson division (see text) |
| INT-TO-FP   | Convert floating point operand R to integer                                   |
| FP-TO-INT   | Convert integer operand R to floating point                                   |
| IEEE-TO-DEC | convert IEEE floating point operand R to<br>DEC floating point format         |
| DEC-TO-IEEE | convert DEC floating point operand R to<br>IEEE floating point format         |

Table 1: Floating point arithmetic unit operations

mantissa appended to an implied or "hidden" MSB, and a sign bit. There are, however, a number of differences between IEEE and DEC floating point conventions, in both the format and the manner in which operands are handled during the course of an operation. These differences are automatically accounted for when the desired format is selected.

The arithmetic unit implements four IEEE-mandated rounding modes that map the infinitely precise result of a calculation to a representable floating point value. An additional VAX-compatible rounding mode is provided for users of the DEC floating point format.

Status flag generator - The status flag generator produces six flags that report operation status. Four of the flags report exception conditions stipulated in IEEE standard P754. The first of these, the INVALID flag, indicates that an operation does not have a sensible answer; multiplying infinity by zero is one example of an invalid operation. Operations producing results either too large or too small to be represented in the selected floating point format are identified by the second and third exception flags, UNDERFLOW and OVERFLOW. The fourth exception flag, INEXACT, indicates that the result of an operation is not infinitely precise. Two additional flags not called for in the IEEE standard, ZERO and NAN, identify zero-valued or non-numerical results.

**Data path** - The integrated data path comprises two input buses, a three-state output bus, and two data feedback buses, all 32 bits wide. Operands enter the Am29325 through input buses  $R_{g-31}$  and  $S_{g-31}$ ; results exit through three-state output bus  $F_{g-31}$ . Each of the R, S, and F buses has a 32-bit edgetriggered register for data storage. An independent clock enable is provided for each register, so that new data can be clocked in or old data held. Input registers R and S, and output register F can be made transparent independently. When all three registers are made transparent, the Am29325 operates in a purely combinatorial "flow-through" mode.

The two feedback data paths transport processor output operands back to the inputs. The first feedback path routes data from the output of the arithmetic unit to a 32-bit multiplexer at the input of register R; the multiplexer selects the operation result or  $R_{g-31}$ . The other feedback path carries the output of register F to a second 32-bit multiplexer, which selects either register S or register F as the input for port S of the arithmetic unit.

To allow easy interface with a variety of 16- and 32-bit systems, buses R, S, and F can be programmed to operate in one of three I/O modes. The first and most straightforward of these is the 32-bit, 2-input-bus mode; in this mode, the R and S buses are configured as independent 32-bit input buses, the F bus as a 32-bit output bus. The second I/O option is a 32-bit, singleinput-bus mode, in which the R and S operands are taken from a single 32-bit input bus on alternate clock edges. For the third option, a 16-bit, two-input-bus mode, the R, S, and F buses are 16 bits wide. Thirty-two-bit operands are placed on these 16bit buses by time-multiplexing the 16 MSBs and LSBs of each data word during alternate halves of the clock cycle. Internal data paths and registers remain 32 bits wide when this 16-bit I/O mode is selected.

#### ARCHITECTURAL ADVANTAGES FOR DSP APPLICATIONS

The architecture of the Am29325 offers several advantages to the implementor of DSP and array processing systems:

Efficient data routing - Three aspects of the Am29325 architecture contribute to efficient data routing. First, placing the adder/subtractor and multiplier on the same die eliminates the shuffling of data between separate adder/subtractor and multiplier chips. Minimizing chip-to-chip communication is an important consideration in high performance system design, since, in VLSI-based systems, the time needed to transfer data between chips can often limit maximum operating speed.

Second, the on-board data paths allow the intermediate result of a calculation to be routed to the input of the floating point arithmetic unit, for use as an input operand in the next phase of the calculation. This feature not only keeps data on-chip, but also makes an external implementation of a similar data path unnecessary. Such an external data path would be expensive, both in components and circuit-board real estate; implementing the two 32-bit multiplexers alone would consume over a dozen MSI devices.

Third, the absence of pipeline delays in the floating point arithmetic unit makes it possible to use the result of one calculation as the input operand for the very next calculation, a crucial feature when implementing algorithms with tight data feedback loops. Users of floating point processors with pipeline delays have one of two choices when implementing such an algorithm - they can either halt the operation while waiting for the desired result to drop out of the pipeline, thus reducing computational efficiency, or can interleave different sets of calculations to keep the arithmetic unit busy, at the cost of more complicated programming. Using a zero-pipelinedelay arithmetic unit avoids both of these unappealing choices.

Am29325 data routing efficiency is best appreciated by considering the manner in which multiplication-accumulation is performed. In a typical multiplication-accumulation calculation,

N input terms  $\textbf{x}_i$  are multiplied by coefficients  $\textbf{k}_i.$  These products are then added, producing the weighted sum

$$s = \sum_{i=0}^{N-1} k_i x_i$$

Multiplication-accumulation is performed in a two-step loop, with two additional steps for initialization (fig. 2a-d). To initialize the process, data and coefficient values  $x_g$  and  $k_g$ are clocked into registers R and S (fig. 2a). Next, the values  $x_g$  and  $k_g$  are multiplied, and the product placed in register F; at the same time, data and coefficient values  $x_1$  and  $k_1$  are clocked into registers R and S (fig. 2b). In the first step of the multiplication-accumulation loop, values  $x_1$  and  $k_1$  are multiplied, and the product placed in register R (fig. 2c). In the second step, products  $x_1 * k_1$  and  $x_g * k_g$  are added, and their sum placed in register F;  $x_2$  and  $k_2$  are placed in registers R and S (fig. 2d).

The two loop steps are then repeated for as many iterations as needed to complete the calculation. The internal data paths wrap back products and accumulations, thus keeping the arithmetic unit busy with a multiplication or addition every clock cycle; a new multiplication-accumulation is performed every two clock cycles. Partial results remain on-chip until the multiplication-accumulation is completed.

High I/O bandwidth - The three 32-bit I/O buses provide high-bandwidth access to the floating point arithmetic unit. When the device is operated in the 32-bit, two-input-bus I/O mode, no multiplexing of I/O buses is required, thus improving system speed and easing critical timing constraints.

**Transparent operation** - In many applications, the R, S, and F registers will be used to store an operation's inputs and outputs; it is in this register-to-register mode that the Am29325 operates the fastest. In some applications, however, it because system requirements dictate a data path structure substantially different from that provided, or because the floating point operations. These situations can be accommodated by making all three registers transparent, turning the floating point processor into a purely combinatorial device; this "flow-through" mode of operation would not be possible if the Am29325



BUS

2:1

Ċx.

REG

BÚS

REG F

BUS Ū

REG S

R TIMES S

(Xe\*Ke)

(C.)

2:1

(X, \*K, )



<sub>BUS</sub> Ū

(d.)

### SYSTEM DESIGN

A block diagram for a typical small system design is shown in fig. 3. The system consists of an Am29325, an Am29334 fourport register file, data memory, coefficient memory, microprogrammed controller, clock generator, and host system interface. Although small enough to fit on a single circuit board, this system contains all the elements needed for floating point digital-signal and array processing.

Because of its three-bus I/O structure and internal feedback paths, the Am29325 can be used to advantage in both cascade and parallel configurations. Fig. 4 illustrates a simple cascade system, a variation on the previous architecture. In this system, the output port of one floating point processor feeds the input port of another. This arrangement is particularly advantageous when performing high-speed



Fig. 2(a-d): Performing floating point multiplicationaccumulation with the Am29325.

Fig. 3: Typical small-system design.

multiplication-accumulation; the first Am29325 forms products, while the second computes the accumulation in parallel. The accumulation is performed using a feedback data path in the second part - no external feedback path is necessary. By doing the multiplications and additions in parallel, the effective throughput rate is one clock per multiplication-accumulation, twice that of the system shown in fig. 3.



Fig. 4: System using two floating point processors in cascade.

Parallel configurations are also useful, and are easily implemented. In one such configuration, the Am29325 is used with other members of the Am29300 family to create a 32-bit floating-point/integer processor (fig. 5). In the system shown, the Am29322 ALU and the Am29323 32-by-32-bit parallel multiplier share three 32-bit buses with the Am29325; data can be passed from one processor to another through the Am29334 register file. Combining these parts produces a system that can perform highspeed floating point, integer, and logical operations. The user can further expand the system by adding 32-bit operators of his own devising to the three-bus architecture.

### REFERENCES

 A Proposed Standard for Binary Floating-Point Arithmetic, IEEE Floating-Point Working Group, draft 10.0, December 2, 1982.

2. VAX Architecture Handbook, Digital Equipment Corporation, 1981.



Fig. 5: Thirty-two-bit floating point/integer processor.

# DIGITAL FILTER DESIGN MADE EASIER FOR FIRST-TIME USERS

Off-the-shelf components and simplified filter formulas ease entry into the world of digital filter design, and allow a quick evaluation of the cost-effectiveness of digital solutions.



### by Kenn Lamb and Bob Perlman

Realtime digital filtering is becoming an attractive alternative in a growing number of analog-filtering applications. Today, specialized digital signal-processing part families, and a range of filter-design packages, make digital filters easier to implement. Nevertheless, getting into digital filter design for the first time is not easy. Some of the concepts are unfamiliar to novice filter designers and the tools generally available are aimed at the more experienced designers.

A "cookbook" approach, however, eases entry into digital filter design and provides a quick way to evaluate the cost-effectiveness of a digital solution. This approach uses off-the-shelf ingredients, such as the Am29500 family, and a simple "recipe" for a linear phase finite-impulse response (FIR) filter. Copyright by COMPUTER DESIGN, November 1985. Reprinted by permission. A digital filter performs the same function as an analog filter, but in a different "world." In the continuous world, a signal is monitored (or sampled) continuously, and filtering is described mathematically as a convolution operation.

In the discrete world, things can be done much more efficiently. The z-transform, in which powers of z can be equated to simple time delays, provides a formula that is the discrete equivalent of the convolution operation. In addition, sampling reduces the monitoring overhead to periodic snapshots of the signal. A digital filter simply implements the discrete convolution formula after an A-D converter has sampled the signal. Any arithmetic processor can perform the discrete convolution required for a digital filter, but the Am29500 family provides a processing package without the overhead of a microprocessor-based system, and can be optimized for digital-filtering applications.

Unlike the characteristics of an analog filter, a digital filter's characteristics are determined by arithmetic operations and coefficients, rather than by individual component values. This makes digital filtering inherently independent of component aging and environmental variables such as temperature. As a result, reliability is improved, and the filter response

Kenn Lamb is a former product planning engineer for Advanced Micro Devices (Sunnyvale, CA). He holds a BSC in electronics from Imperial College in London.

Bob Perlman is a product planning section manager at AMD. He holds an MSEE from Johns Hopkins University.

can be accurately reproduced. In addition, digital techniques permit useful characteristics that are not easily achieved in analog systems. Among these are zero passband insertion loss, very low frequency operation, and control over the stopband response.

Using digital techniques, a designer can build linear-phase and all-pass filters that modify only the frequency or phase content of a signal. Linear-phase filters are used in multichannel environments where phase information is important, while all-pass filters are used typically for equalization. Cascading allpass and linear-phase filters allows phase and frequency responses to be modified independently. In addition, filter coefficients, which are programmed by the designer, are easier to change than the components of an analog filter.

Because analog filters operate in the continuous world and can handle only one signal at a time, analog designs usually cascade a number of stages. As a result, there is a clear relationship between the physical size of an analog filter and its performance. The bandwidth of the active components within the filter determines the overall filter bandwidth, but spare bandwidth cannot be used for other channels.

Because the digital filter core is an arithmetic section that performs the calculations according to the discrete convolution formula, it may be used for one large, or many small, single-channel filters, or may be a stage in a very large, high-bandwidth filter.

### **Relationship between time and frequency**

The match between design and actual response in an analog filter is determined by the tolerance with which analog components may be constructed; the digital equivalent is the resolution (number of bits) at which the digital coefficients are represented. A relatively lax analog specification places wide tolerance on the component values, which translates into a less accurate (fewer bits) resolution of the digital coefficients. For an analog filter, the dynamic range corresponds to the range between the noise floor and the point at which the signal starts to clip. Dynamic range in a digital filter is determined by the number of bits in the digital representation of the signal.



### What is convolution?

Two continuous functions x(t) and h(t) can be convolved by evaluating the equation:

$$y(t) = \int_{-\infty}^{\infty} x(\pi) \times h(t-\pi) d\pi$$

The discrete-time equivalent of this equation is:

$$y(n) = \sum_{k=0}^{N-1} x(k) \times h(n-k)$$

Although somewhat different in appearance, both forms of the convolution equation can be evaluated similarly. First, the function h is time-reversed, or flipped. Then, function h is shifted left or right, with the amount of shift indicated by time variables t or k. The resulting function is then either integrated (in the continuous case) or multiplied and summed point by point (in the discrete case) with function x.

One unfortunate aspect of convolution is that its worth is not readily apparent from the defining equations presented above. The real power of convolution is best appreciated by considering what happens in the frequency domain when two signals are convolved in the time domain. If the Fourier transforms of continuous signals x(t) and h(t) are X(f) and H(f), the convolution of x(t) and h(t) produces a signal whose frequency spectrum is equal to the product X(f)H(f). In other words, if one wishes to pass signal x(t) through a filter with transfer function H(f), one merely convolves x(t) with h(t). The same principle holds for discrete-time signals, but with the z transform.

The discrete convolution of a sampled signal (a) with a sampled-impulse response (b) may be achieved by the following process. First, flip the impulse response and place it over the signal (c). Sum the products of each impulse sample with its coincident signal sample. The total is the convolution

A signal to be filtered exists in the time domain of the familiar continuous world. The desired filter response for this signal is best represented in the frequency domain of the continuous world. These two domains are related through the continuous Fourier transform. The Fourier transform of a time-varying signal is its frequency spectrum, and the Fourier transform of the filter's frequency response is its impulse response. Viewed in the time domain, a filter's output is determined by the time-varying amplitude of the input signal and the filter's impulse response. In the frequency domain, on the other hand, the signal's frequency spectrum and the filter's frequency response determine the output. The frequency response and the impulse response say the same thing about a filter; the impulse response is result for the particular overlap, and corresponds to the first filter output. Now, slide the impulse response one sample interval to the right, so that it overlaps the next newest signal sample. Repeat the multiply/accumulate sequence. This result is the convolution for the new overlap, and corresponds to the second output from the filter. This process is repeated to calculate each new filter output. After each filter output, the oldest signal sample is no longer required, and may be discarded (d).



simply the time-domain version of the filter's frequency response.

Just the continuous Fourier transform allows movement between the time and frequency domains of the continuous world, the z-transform allows movement directly from the continuous world into the sampled discrete world. Here the discrete time and frequency domains exist, linked by the discrete Fourier transform. Sampled versions of the input signal, the signal's frequency spectrum, and the filter's frequency and impulse response are used.

### **Design tradeoffs**

To understand the design tradeoffs between filter size and performance, a designer must be familiar with two relationships between the time and frequency domains: multiplication in one domain is equivalent to convolution in the other domain, and a signal cannot be duration-limited in both domains. A filter can perform a multiplication in the frequency domain in which the frequency spectrum of an incoming signal is multiplied by the frequency response of the filter, or it can perform a convolution operation in the time domain. For a digital filter, this requires the discrete convolution of a sampled version of the input signal with a sampled version of the filter's impulse response. The latter forms the coefficients of the filter.

The frequency response of the filter will almost certainly be duration-limited in the frequency domain, because it's unusual for a filter to pass all input frequencies. Obvious examples are low-pass and highpass filters, where the aim is the elimination of great chunks of the input signal's frequency content. Because a signal cannot be duration-limited in both the time and frequency domains, a filter such as this will have an infinitely long impulse response. A simplistic approach to shortening this response is to truncate the impulse response to a convenient length.

Truncating the impulse response is equivalent to multiplying it by a function that has a value of one where the impulse response is to be preserved, and a value of zero where the impulse response is truncated. This truncating function is called rect (x) because its amplitude plot describes a rectangle. Since multiplication in the time domain is equivalent to a convolution in the frequency domain, the initial ideal filter response must be convolved with the Fourier transform of the rect (x) function. The Fourier transform is the well-known sinc function. The nature of this sinc function, however, smears the original ideal frequency response. And the more of the impulse response that is discarded, the worse the smearing effects of the sinc function.

A designer can obtain a duration-limited impulse response without wrecking the original filter's frequency response by multiplying the impulse response by a function with characteristics such that when the frequency response of this function is convolved with the desired filter's frequency response, it causes the minimum of smearing distortion. Window functions, such as the Hamming function do just this. A window function such as this has a narrow main lobe to maintain the selectivity of the filter, and small sidelobes to maintain the depth of the stopband.

### **Designing a low-pass filter**

The design of a low-pass filter is particularly easy, because the impulse response is obtained from the Fourier transform of the ideal rectangular response, which then takes the form of a geometric series, with coefficients that can be expressed as the relatively simple sinc function  $\sin (x)/x$ . The number (N) of coefficients (S<sub>n</sub>) needed to implement the filter depends on the sampling rate (usually 2.5 times the maximum frequency in the signal), the cutoff frequency of the filter, and the frequency that defines the start of the stopband. These coefficients must be multiplied by appropriate window function coefficients (W<sub>n</sub>) to yield windowed filter coefficients (C<sub>n</sub>).

These decimal coefficient values must then be converted into 16-bit binary values, using the fractional two's complement numbering scheme. The coefficients are normally stored within registered PROMs, although RAM or EPROM storage offers advantages during development. Most A-D converters support the two's complement numbering scheme, ensuring that the representations of the data and coefficients are compatible.

Filter hardware is used to implement the discrete convolution operation given by the formula:

$$Y_k = \sum_{n=1}^{N} C_n \times X_{(k-n+1)}$$

where  $X_k$  and  $Y_k$  are the kth filter inputs and outputs, respectively. A filter implemented with this equation uses N data inputs to compute each filter output, and is therefore referred to as an N-tap or N-point filter.

One advantage of a linear phase filter is that the coefficients are symmetrical. This means the required number of multiplications and the size of the coefficient store can be halved by adding the two data points that will be multiplied by the same coefficient value. The modified formula is:

$$Y_{k} = \sum_{n=1}^{N/2} C_{n} \times [X_{(k-n+1)} + X_{(k-N+n)}]$$

The filter hardware used to implement this formula consists of three distinct sections: the address generation and control, the arithmetic section that performs the number crunching, and the store. The store holds a short time history of N samples for each of the channels to be filtered. These samples are held in a cyclic buffer with a length equal to the next integer power of two greater than the number of points in the filter. For each processed sample output from the filter, a new sample must be written in. This new sample overwrites the oldest sample within the cyclic buffer.

For each output from the filter, the N newest samples must be read into the arithmetic section so the discrete convolution operation can be performed. Two pointers are needed—one (which counts up) to indicate the write address for incoming samples, and and one (which counts down) to indicate the read addresses for the discrete convolution operation.

The store will be required to read two data points for every cycle of the multiplier. The second data point may be found at an offset from the existing read pointer. An Am29PL141 fuse-programmable controller supplies this offset, and the modified address is calculated by an 8-bit adder formed from two Am25LS2517 ALUs.

For a six-point filter, the offsets (0, 3, 5, and 7) are applied, on alternate cycles, to the 8-bit adder by the Am29PL141, to permit generation of the second read address from the first read address.

For a six-point filter, all of the address calculations are performed modulo 8 (the next power of 2 greater than 6), which is done by masking the address applied to the store so that the store sees only the least significant three bits. After each new word is written into the store, the write pointer loads the read pointer with the correct cyclic buffer start address for the calculation of the next filter output.

Multiple channels may be accommodated by inhibiting the increments of the write pointer until all the channels have input a new sample. The channels are counted by the loop counter within the Am29PL141 fuse-programmable controller, and are separated in the store section by the high-order address bits latched with each new input. The channels may be presented in any order, but all channels must be processed at the same rate. When shifted one position so that the least significant bit is discarded, the values forming the offset sequence give

### Determining the coefficients for a low-pass filter

The coefficients for a low-pass filter can be expressed in terms of the so-called sinc function. The number of coefficients required to implement the filter N is given by:

$$N = 4F_{s}/(F_{cs} - F_{cp})$$

 $F_{\rm s}$ , the sampling rate of the A-D converter, is equal to 2.5 times  $F_{\rm o}$ , the maximum frequency present in the input signal (set by the antialiasing filter).  $F_{\rm cp}$  is the cutoff frequency of the filter (the end of the passband) and  $F_{\rm cs}$  is the frequency that defines the start of the stopband (the end of the transition band).

This estimate for N (the number of coefficients) is usually conservative, so the value of N may be reduced safely by about 10 percent. This leeway allows an even value for N to be obtained. Having determined N, the coefficient values can be obtained by sampling the filter's impulse response. The required coefficients are given by the following sinc functions:

For n = 1 to N  

$$S_{n} = \frac{\sin [2 \times \pi \times F_{cp} \times (n - (N + 1)/2)/F_{s}]}{\pi \times (n - (N + 1)/2)}$$

where the  $S_n$  values are the samples of the sinc function. These values must then be multiplied by the window function to yield the windowed filter coefficients ( $C_n$ ):

$$C_n = S_n \times W_n$$

where  $W_n$  (the Hamming window coefficients) are given by:

$$W_n = 0.54 + 0.46 \times \cos [2 \times \pi \times (n - (N + 1)/2)/(N - 1)]$$

Choosing  $F_o$  equal to 5 kHz,  $F_{cp}$  equal to 3 kHz, and  $F_{cs}$  equal to 4.5 kHz gives an N equal to 33. Reducing this by 10 percent makes N equal to 30. Inserting the values for  $F_s$ ,  $F_{cp}$ , and N into the above equations gives:

 $\begin{array}{l} S_n = [\sin{(0.48 \times \pi \times (n - 15.5))}] / (\pi \times (n - 15.5)) \\ W_n = 0.54 + 0.46 \ x \cos{(2 \ x \ \pi \times (n - 15.5)/29)} \end{array}$ 

This gives the C<sub>n</sub> coefficients listed in the table.

| 30-Point Low-Pass Filter Coefficients |                                         |                                             |                                             |  |  |  |  |  |
|---------------------------------------|-----------------------------------------|---------------------------------------------|---------------------------------------------|--|--|--|--|--|
| Coefficient<br>Index<br>(n)           | Impulse<br>Samples<br>(S <sub>n</sub> ) | Window<br>Coefficients<br>(W <sub>n</sub> ) | Filter<br>Coefficients<br>(C <sub>n</sub> ) |  |  |  |  |  |
| 1                                     | 0.00275                                 | 0.080                                       | 0.00022                                     |  |  |  |  |  |
| 2                                     | 0.02353                                 | 0.090                                       | 0.00212                                     |  |  |  |  |  |
| 3                                     | 0.00000                                 | 0.122                                       | 0.00000                                     |  |  |  |  |  |
| 4                                     | -0.02762                                | 0.173                                       | -0.00478                                    |  |  |  |  |  |
| 5                                     | -0.00380                                | 0.242                                       | -0.00092                                    |  |  |  |  |  |
| 6                                     | 0.03291                                 | 0.324                                       | 0.01066                                     |  |  |  |  |  |
| 7                                     | 0.00931                                 | 0.417                                       | 0.00388                                     |  |  |  |  |  |
| 8                                     | -0.04036                                | 0.515                                       | -0.02078                                    |  |  |  |  |  |
| 9                                     | -0.01803                                | 0.614                                       | -0.01117                                    |  |  |  |  |  |
| 10                                    | 0.05236                                 | 0.710                                       | 0.03717                                     |  |  |  |  |  |
| 11                                    | 0.03407                                 | 0.798                                       | 0.02718                                     |  |  |  |  |  |
| 12                                    | -0.07679                                | 0.874                                       | -0.06711                                    |  |  |  |  |  |
| 13                                    | -0.07484                                | 0.934                                       | -0.06990                                    |  |  |  |  |  |
| 14                                    | 0.16350                                 | 0.976                                       | 0.15958                                     |  |  |  |  |  |
| 15                                    | 0.43580                                 | 0.997                                       | 0.43449                                     |  |  |  |  |  |
| 16                                    | 0.43580                                 | 0.997                                       | 0.43449                                     |  |  |  |  |  |
| 17                                    | 0.16350                                 | 0.976                                       | 0.15958                                     |  |  |  |  |  |
| 18                                    | -0.07484                                | 0.934                                       | -0.06990                                    |  |  |  |  |  |
| 19                                    | -0.07679                                | 0.874                                       | -0.06711                                    |  |  |  |  |  |
| 20                                    | 0.03407                                 | 0.798                                       | 0.02718                                     |  |  |  |  |  |
| 21                                    | 0.05236                                 | 0.710                                       | 0.03717                                     |  |  |  |  |  |
| 22                                    | -0.01803                                | 0.614                                       | -0.01117                                    |  |  |  |  |  |
| 23                                    | -0.04036                                | 0.515                                       | -0.02078                                    |  |  |  |  |  |
| 24                                    | 0.00931                                 | 0.417                                       | 0.00388                                     |  |  |  |  |  |
| 25                                    | 0.03291                                 | 0.324                                       | 0.01066                                     |  |  |  |  |  |
| 26                                    | -0.00380                                | 0.242                                       | -0.00092                                    |  |  |  |  |  |
| 27                                    | -0.02762                                | 0.173                                       | -0.00478                                    |  |  |  |  |  |
| 28                                    | 0.00000                                 | 0.122                                       | 0.00000                                     |  |  |  |  |  |
| 29                                    | 0.02353                                 | 0.090                                       | 0.00212                                     |  |  |  |  |  |
| 30                                    | 0.00275                                 | 0.080                                       | 0.00022                                     |  |  |  |  |  |

| Cycle<br>Number                                          | Write<br>Count                                                           | Read<br>Count                                                            | Offset                                                                                               | Read<br>Address                                                     | Write<br>Address                                                                                                             | Add                                                                                 | Mult                                                                                     | Output                                                   |
|----------------------------------------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------------------------|
| 1                                                        | 0                                                                        | load                                                                     | -                                                                                                    | -                                                                   | 0                                                                                                                            | * * *                                                                               | * * * * * *                                                                              |                                                          |
| 2                                                        | 0                                                                        | 0                                                                        | 0                                                                                                    | 0                                                                   | -                                                                                                                            | * * *                                                                               | * * * * * *                                                                              |                                                          |
| 3                                                        | 0                                                                        | 0                                                                        | 3                                                                                                    | 3                                                                   | -                                                                                                                            | • `                                                                                 | *****                                                                                    |                                                          |
| 4                                                        | 0                                                                        | 7                                                                        | 0                                                                                                    | 7                                                                   | -                                                                                                                            | 0+3                                                                                 | *****                                                                                    |                                                          |
| 5                                                        | 0                                                                        | 7                                                                        | 5                                                                                                    | 4                                                                   | -                                                                                                                            | 0+3                                                                                 |                                                                                          | * *                                                      |
| 6                                                        | 0                                                                        | 6                                                                        | 0                                                                                                    | 6                                                                   | -                                                                                                                            | 7 + 4                                                                               | $0 + 3 \times C1$                                                                        |                                                          |
| 7                                                        | 0                                                                        | 6                                                                        | 7                                                                                                    | 5                                                                   | -                                                                                                                            | 7 + 4                                                                               | 0 + 3 × C1                                                                               |                                                          |
| 8                                                        | 1                                                                        | load                                                                     | -                                                                                                    | -                                                                   | 1                                                                                                                            | 6 + 5                                                                               | 7+4×2                                                                                    |                                                          |
| 9                                                        | 1                                                                        | 1                                                                        | 0                                                                                                    | 1                                                                   | -                                                                                                                            | 6 + 5                                                                               | 7 + 4 × C2                                                                               |                                                          |
| 10                                                       | 1                                                                        | 1                                                                        | 3                                                                                                    | 4                                                                   | -                                                                                                                            |                                                                                     | 6+5×C3                                                                                   |                                                          |
| 11                                                       |                                                                          | 0                                                                        | 0                                                                                                    | 0                                                                   | -                                                                                                                            | 1+4                                                                                 | 6 + 5 × C3                                                                               |                                                          |
| 12                                                       | 1                                                                        | 0                                                                        | 5                                                                                                    | 5                                                                   | -                                                                                                                            | 1+4                                                                                 |                                                                                          | Y1                                                       |
| 13                                                       | 1                                                                        | 7                                                                        | 0                                                                                                    | 7                                                                   | -                                                                                                                            | 0 + 5                                                                               | $1 + 4 \times C1$                                                                        |                                                          |
| 14                                                       | 1                                                                        | 7                                                                        | 7                                                                                                    | 6                                                                   | -                                                                                                                            | 0+5                                                                                 | $1 + 4 \times C1$                                                                        |                                                          |
| 15                                                       | 2                                                                        | load                                                                     | -                                                                                                    | -                                                                   | 2                                                                                                                            | 7+6                                                                                 | $0+5\times C2$                                                                           |                                                          |
| 16<br>17                                                 | 2                                                                        | 2                                                                        | 0                                                                                                    | 2<br>5                                                              | -                                                                                                                            | 7 + 6                                                                               | $0+5\times C2$                                                                           |                                                          |
| 18                                                       | 2                                                                        | 2                                                                        | 3<br>0                                                                                               | 5                                                                   | -                                                                                                                            | 2 + 5                                                                               | 7+6×C3<br>7+6×C3                                                                         |                                                          |
| 18                                                       | 2                                                                        | 1                                                                        | 5                                                                                                    |                                                                     | -                                                                                                                            | 2 + 5<br>2 + 5                                                                      | 7+6×C3                                                                                   | ~~                                                       |
| 20                                                       | 2                                                                        | 1                                                                        | 0                                                                                                    | 6<br>0                                                              | -                                                                                                                            | 2+5<br>1+6                                                                          | $2 + 5 \times C1$                                                                        | Y2                                                       |
| 20                                                       | 2                                                                        | 0                                                                        | 7                                                                                                    | 7                                                                   | -                                                                                                                            | 1+6                                                                                 | $2+5\times C1$<br>2+5×C1                                                                 |                                                          |
| 21                                                       | 3                                                                        | load                                                                     | /                                                                                                    | -                                                                   | 3                                                                                                                            | 0 + 7                                                                               | $2+5\times C1$<br>1+6×C2                                                                 |                                                          |
| 22                                                       | 3                                                                        | 3                                                                        | 0                                                                                                    | 3                                                                   | 3                                                                                                                            | 0 + 7                                                                               | $1 + 6 \times C2$<br>1 + 6 × C2                                                          |                                                          |
| 23                                                       | 3                                                                        | 3                                                                        | 3                                                                                                    | 6                                                                   | -                                                                                                                            |                                                                                     | $0 + 7 \times C3$                                                                        |                                                          |
| 25                                                       | 3                                                                        | 2                                                                        | 0                                                                                                    | 2                                                                   | -                                                                                                                            | 3+6                                                                                 | $0 + 7 \times C3$<br>0 + 7 × C3                                                          |                                                          |
| 26                                                       | 3                                                                        | 2                                                                        | 5                                                                                                    | 7                                                                   | _                                                                                                                            | 3+6                                                                                 | 017 × 00                                                                                 | Y3                                                       |
|                                                          |                                                                          | -                                                                        | Ű                                                                                                    |                                                                     |                                                                                                                              |                                                                                     |                                                                                          | 10                                                       |
| • •                                                      |                                                                          | . 1                                                                      |                                                                                                      | •                                                                   |                                                                                                                              |                                                                                     |                                                                                          |                                                          |
| seventh<br>ing the<br>cycles a<br>least sig<br>cycle, th | cycle. Read<br>read count<br>re required<br>gnificant ha<br>re most sign | addresses<br>to the off<br>for each add<br>alves being<br>nificant in tl | the read cou<br>are obtained<br>set modulo<br>d operation,<br>added in<br>he second.<br>es of the da | d by add- a<br>o 8. Two o<br>with the p<br>the first c<br>The Add p | plier/accumula<br>dder, and ther<br>peration. The<br>rior add opera<br>oefficients. The<br>ut column. The<br>oultiplying the | efore require<br>Mult column<br>tions being r<br>e filter output<br>e filter coeffi | es two cycles f<br>shows the re<br>multiplied by t<br>is are shown in<br>cients are obta | or each<br>sults of<br>he filter<br>the out-<br>ained by |

the new sequence (0, 1, 2, 3) required to address the coefficients. An Am29520 must implement a two-cycle delay to ensure that the address is made available to the registered PROMs when needed.

The arithmetic section consists of a single 8-bit Am29501 and a 16-bit Am29L510. The 501 performs a 16-bit addition for every cycle of the 510, and operates at twice the clock rate of the 510. An additional 9-bit Am29823 register latches the least significant eight bits and the carry out of the add operation as it is performed by the 501. The 510 performs the multiply/accumulate operations required by the discrete convolution process; the guard bits within the accumulator accommodate word growth.

There are two store requirements within the filter—the data and the coefficients. The data resides in two 2-kword, 8-bit wide Am9128 RAMs that cycle at the same rate as the 501. The coefficients reside in two 8-bit wide Am27S25 registered PROMs that cycle at the rate of the 510.

This filter structure accommodates up to 16 channels, each filtered by one stage of a linear phase FIR filter that has up to 128 points. Each point requires 50 ns to process, yielding an effective sampling rate of 40 kHz for each of the 16 channels using the lowpass filter described in the panel "Determining the coefficients for a low-pass filter." There is sufficient capacity within the PROMs for 32 different sets of filter coefficients, allowing a different filter to process each channel. Outputs may be returned as inputs to a different channel, allowing steeper rolloffs by cascading filters, or more complicated responses by cascading low- and high-pass filters.

This filter structure can be configured to yield a filter bank capable of resolving up to 32 spectral components from an input signal sampled at 20 kHz (using filters of similar complexity to the low-pass one in the panel). This type of filter bank provides a stable platform for automatic speech-recognition algorithms. If 20 channels are dedicated to resolving

spectral components, 37 percent of the available processing bandwidth remains for squaring and integrating the outputs from each filter. A low-pass filter performs integration, and threshold detection may be performed with the Am29501.

The hardware described can perform all of the processing required to generate the "intensity" spectrum of the input signal, which is the major processing requirement of a speech-recognition system.

One of the advantages of a filter bank implementation is that the signal spectrum may be broken into completely arbitrary divisions (20 in this example). An FFT implementation, on the other hand, would require a linear scale, while decimating filter techniques would require divisions linked by integer ratios. A useful scale not usually available has a logarithmic spacing of center frequencies and bandwidths. This logarithmic spacing may be specified at the design stage. The band of interest ranges from 400 Hz to 8 kHz, which may be satisfactorily sampled at a rate of 20 kHz. Each of these center frequencies must be entered into the band-pass filterimpulse response equation, with the associated bandwidth, in order to generate the filter coefficients.

The design of a band-pass filter using windowedimpulse techniques follows the same steps as the low-

### Determining the coefficients for a band-pass filter

The coefficients for a band-pass filter are derived from the following variation of the low-pass filterimpulse response formula:

$$S_{n} = \frac{\sin [\pi \times B \times (n - (N + 1)/2)/F_{s}] \times 2 \times \cos}{\frac{[2 \times \pi \times F_{c} \times (n - N + 1)/2/F_{s}]}{\pi \times (n - (N + 1)/2)}}$$

where B is the 3dB bandwidth,  $F_c$  is the center frequency,  $F_s$  is the sampling frequency, N is the number of coefficients, and n is the coefficient index. The same Hamming window formula used for the low-pass filter may be used here.

Specifying each filter to have the same number of coefficients simplifies the overall control of the filter bank. Up to 32 coefficients may be used for each filter without exceeding the available processing bandwidth; a reasonable design specification would set N equal to 32, with  $F_s$  at 20 kHz. Inserting these values in the impulse response formula yields:

$$S_{n} = \frac{\frac{\sin [\pi \times B \times (n - 16.5)/20000] \times 2 \times \cos}{[\pi \times F_{c} \times (n - 16.5)/10000]}}{\pi \times (n - 16.5)}$$

Substituting the 20 pairs of values for  $F_c$  and B into this formula will, after multiplying by the appropriate window term, yield the required filter coefficients. These coefficients must then be converted into a fractional two's complement format, and programmed into the PROMs.

pass filter design, with two differences: the number of coefficients roughly doubles because there are two distinct stopbands within the band-pass filter response, and the coefficients are computed using a modified formula.

Because a single input channel will pass through 20 different band-pass filters, the same allocation of 20 channels is made here as in the low-pass filter, but there is a slight difference in the control sequence. Previously, a new sample was written into the cyclic buffer each time a processed point came out of the filter. But, for this application, a new sample is written into the cyclic buffer for every 20 output points from the filter. This means modifying the controller microcode so that the write pointer increments are inhibited until 20 outputs have been calculated.

## CHAPTER 6 PRODUCT SPECIFICATIONS

### Am29501A

Multi-Port Pipelined Processor (Byte-Slice<sup>™</sup>)

### ADVANCED INFORMATION

- DISTINCTIVE CHARACTERISTICS
- Expandable Byte-Slice<sup>TM</sup> Register-ALU

   Speed improved version of the Am29501
  - Eight instruction ALU
    - Four arithmetic operations
  - Four logic operations
  - Force/Inhibit carry modes
  - Flexible expansion has carry and P/G
- Three I/O ports for maximum system interconnect flexibility
- Ten internal data paths
  - Highly parallel architectures
  - Multiple simultaneous data manipulations
- Pipelining register file has six 8-bit registers
  - Multilevel pipelining
  - Multiple register-to-register moves
- Completely microprogrammable
  - No instruction encoding
  - All operation combinations available

### **GENERAL DESCRIPTION**

The Am29501A is an expandable Byte-Slice<sup>TM</sup> register-ALU designed to bring maximum speed to array processor and digital signal processor systems. It provides a flexible processor building block for implementing highly pipelined, highly parallel architectures where speed is achieved by a combination of optimized integrated circuit technology (IMOX<sup>TM</sup> process and internal ECL circuitry) and customized system architecture. I/O port flexibility and multiple concurrent data moves make it possible to construct processors capable of very high throughput. Parallel processors are especially efficient for array/vector operations or signal processing algorithms requiring complex number arithmetic (e.g. FFT, convolution, correlation, etc.). The Am29501A's Pipeline Register File provides data storage and pipelining flexibility. Any combination of register instructions, ALU instructions, and I/O instructions can be microprogrammed to occur in the same cycle. This allows overlap of external multiplication, ALU operations, and memory I/O.

Three I/O ports support a wide variety of parallel, pipelined architectures by providing separate I/O ports for the multiplier and the memory data bus. Either of two bidirectional I/O ports, DIO and MIO, can interface to the data bus or multiplier Y-input port. A separate MI port connects to the multiplier output port.



Am29C509 12 x 12 CMOS Multiplier Accumulator

### **ADVANCED INFORMATION**

### DISTINCTIVE CHARACTERISTICS

- High-Speed 1.6µ CMOS Process The Am29C509, at 50 ns maximum clock rate, supports real-time process
- 27-bit Product Accumulation Result Provides 24-bit product plus 3-bit extended product
- Accumulator Function
   The accumulator value may be altered by LOAD, ADD and SUBTRACT instruction
- Output Register Preload
   The predetermined value can be loaded into the output
   register
- Round Control The most significant 12 bits of the product rounded to the value nearest to the full 24-bit product
- Accept 2's Complement or Unsigned Inputs

### **GENERAL DESCRIPTION**

The Am29C509 is a high-speed 12-bit x 12-bit multiplier/ accumulator (MAC). The X and Y input registers accept 12bit inputs in two's complement or unsigned magnitude format. A third register stores the Two's Complement (TC) and Round (RND), Accumulate (ACC), and Subtraction SUB/ADD control bits. This register is clocked whenever the X or Y input registers are clocked. The 27-bit accumulator/output register contains the full 24bit multiplier output which is sign extended or zero-filled based on the TC control bit. The accumulator can also be preloaded from an external source through the bidirectional P-port. The operation <u>of the</u> accumulator is controlled by the signals ACC, SUB/ADD, and PREL (Preload). Each of the input registers and output register has independent clocks.



### **RELATED PRODUCTS**

| Part No.      | Description                          |
|---------------|--------------------------------------|
| Am29526/527   | High speed sine function generator   |
| Am29528/529   | High speed cosine function generator |
| Am29540       | Programmable FFT address sequencer   |
| Am29520A/521A | Multilevel pipeline registers        |
| Am29524/525   | Multilevel pipeline registers        |

## Am29510/L510

16 x 16 Multiplier Accumulator

### PRELIMINARY

### DISTINCTIVE CHARACTERISTICS

- Uses two's complement or unsigned inputs and outputs
- Round control
- Output register preload
- 35-bit product accumulator result
  - 32-bit product
- 3-bit extended product

- IMOX<sup>™</sup> processing
  - ECL internal circuitry for speed
  - TTL I/O, single 5V supply
- Fast
  - High speed version multiply accumulate time 80ns
  - Low power version multiply accumulate time 110ns

### **GENERAL DESCRIPTION**

The Am29510 is a high-speed 16 x 16-bit multiplier/ accumulator (MAC). The X and Y input registers accept 16-bit inputs in two's complement or unsigned magnitude format. A third register stores the Two's Complement (TC) and Round (RND) control bits. This register is clocked whenever the X or Y input registers are clocked.

The 35-bit accumulator/output register contains the full 32bit multiplier output which is sign extended or zero-filled based on the TC control bit. The accumulator can also be preloaded from an external source through the bidirectional P port. The operation of the accumulator is controlled by the signals ACC (Accumulator), SUB/ADD (Subtraction/Addition), and PREL (Preload). Each of the input registers and output register has independent clocks.

The Am29L510 is a low-power version of the Am29510. The Am29L510 consumes only one-half the power of its standard power counterpart while maintaining nearly two-thirds the speed.



Multilevel pipeline registers

Am29520/21

## Am29C516/C17 Family

ADVANCED INFORMATION

### DISTINCTIVE CHARACTERISTICS

- High-Speed 1–6μ CMOS Process Supports 70 ns clocked multiply with 0.5W power dissipation.
- Full Product Multiplexed at Output MSP and LSP routed to output port via multiplexer.
- Two's Complement, Unsigned or Mixed Operands
- Am29C516: Pin-for-Pin Compatible with Am29516 and MPY16HJ
- Am29C517: Pin-for-Pin Compatible with Am29517.
   Has Single Clock Input with Register Enables (Useful for Microprogrammed Systems)

### GENERAL DESCRIPTION

The Am29C516 and Am29C517 are high-speed parallel 16 x 16-bit multipliers utilizing internal CMOS logic to generate a 32-bit product. Two 17-bit input registers are provided for the X and Y operands and their associated mode controls  $X_M$  and  $Y_M$ . These mode controls are used to specify each operand as either two's complement or unsigned numbers. When one operand is two's complement and the other is unsigned, the result is two's complement.

At the output of the multiplier array, a format adjust control (FA) allows the user to select either a full 32-bit product or a left-shifted 31-bit product suitable for two's complement only.

Two 16-bit output registers are provided to hold the most and least significant halves of the product (MSP and LSP) as defined by FA. For asynchronous output, these registers may be made transparent by taking the feed through control (FT) High. A round control (RND) allows the rounding of the MSP; this control is registered, and is entered whenever either input register is clocked. The two halves of the product may be routed to a 16-bit 3-state output port (P) via a multiplexer, and in addition, the LSP is connected to the Y-input port through a separate three-state buffer.

The Am29C516 X, Y, MSP and LSP registers have independent clocks (CLKX, CLKY, CLKM, CLKL). The output multiplexer control (MSPSEL) uses a pin which is a supply ground in the TRW MPY16HJ. When this control is LOW, the function is that of the MPY16HJ, thus allowing full compatibility.

The Am29C517 differs in that it has a single clock input (CLK) and three register enables (ENX, ENY, ENP) for the two input registers and the entire product, respectively. This facilitates the use of the part in microprogrammed systems. In both parts data is entered into the registers on the positive edge of the clock.



### RELATED PRODUCTS

| Part No.     | Description                   |
|--------------|-------------------------------|
| Am29501/A    | Multiport pipelined processor |
| Am29526/27   | Sine function generator       |
| Am29528/29   | Cosine function generator     |
| Am29520A/21A | Pipeline register             |
| Am29540      | Address generator             |

## Am29520 • Am29521

**Multilevel Pipeline Registers** 

### DISTINCTIVE CHARACTERISTICS

- Four 8-bit high speed registers
- Dual two-level or single four-level push-only stack operation
- All registers available at multiplexed output
- Hold, transfer and load instructions
- · Provides temporary address or data storage
- 24-pin 0.3" package

### **GENERAL DESCRIPTION**

The Am29520 and Am29521 each contain four 8-bit positive edge-triggered registers. These may be operated as a dual 2-level pipeline or as a single 4-level pipeline. A single 8-bit input is provided and all four registers are available at the 8-bit, 3-state output.

The Am29520 and Am29521 differ only in the way data is loaded into and between the registers in dual 2-level operation. This difference is illustrated in Figure 1. In the Am29520 when data is entered into the first level (I=2 or I=1) the existing data in the first level is moved to the second level. In the Am29521 these instructions simply cause the data in the first level to be overwritten. Transfer of data to the second level is achieved using the 4-level shift instruction (I=0). This transfer also causes the first level to change. In either part I=3 is a NO-OP.



### Am29524/Am29525

Dual 7-/8-Deep Pipeline Registers

### PRELIMINARY

### **DISTINCTIVE CHARACTERISTICS**

- · Provides temporary address or data storage
- Am29524 Dual 7-deep or single 14-deep (with feedthrough and 0) registers
- Am29525 --- Dual 8-deep or single 16-deep registers
- · Hold, or shift and load instructions

- High-speed ECL-internal technology, TTL-compatible I/O, packaged in 28-pin 0.4-inch (400 Mil) sidebrazed DtPs
- All registers available at three-state output

### **GENERAL DESCRIPTION**

The Am29524/Am29525 are 8-bit wide, 14- and 16-word deep pipeline registers. The registers are organized as two 7- or 8-byte shift registers. By microinstruction, data may be entered into either shift register.

In the Am29524, the shift registers are 7 deep. All fourteen registers are available at the output, as well as the input data fed directly to the output, or an all-zero byte. In the

Am29525, the shift registers are 8 deep and any of the sixteen registers may be output. In both devices, the two shift-register stacks may be cascaded to form a single 14-or 16-byte-long pipeline register.

A single clock is provided and operation of the shift registers is under microprogram control. The device has three-state outputs.



Am29526 • Am29527 Am29528 • Am29529 High Speed Sine, Cosine Generators

### DISTINCTIVE CHARACTERISTICS

- Provides values for sine/cosine functions in π/2048 increments
- Outputs are 16-bit two's complement fractions
- Fast generation time of 50ns max Com'l
- S/LS compatible
- Three-state outputs
- IMOX<sup>™</sup> processing

### **RELATED PRODUCTS**

| Part No.   | Description                        |  |  |  |  |  |  |
|------------|------------------------------------|--|--|--|--|--|--|
| Am29516/17 | 16 x 16-Bit High Speed Multipliers |  |  |  |  |  |  |
| Am29510    | 16 x 16-Bit Multiply Accumulator   |  |  |  |  |  |  |
| Am29540    | FFT Address Sequencer              |  |  |  |  |  |  |
| Am29825    | High Performance 8-Bit Register    |  |  |  |  |  |  |



### FUNCTIONAL DESCRIPTION

The Am29526/27 and Am29528/29 provide high speed generation of sine and cosine functions over the range  $0 \le \theta < \pi$  in increments of  $\pi/2048$ .  $\theta$  is determined by an 11-bit input word. Each device provides an 8-bit output and two are used to give the full 16-bit value. The Am29526 and Am29527 generate the MS and LS bytes respectively for the sine function. Similarly, the Am29528 and Am29529 generate the cosine functions.

The outputs are fractional two's complement numbers with the radix point located immediately to the right of the sign bit (in between the bits weighted  $-2^{\circ}$  and  $2^{-1}$ ). As this format does not allow for the representation of +1 the functions generated are  $-\sin\theta$  and  $-\cos\theta$ . In this way the output values are restricted to the range  $-1 \le f(\theta) < +1$  which is representable. The outputs are three-state with one active Low enable and two active High enable.

While providing general purpose sine and cosine function capability, the Am29526/27/28/29 satisfy the requirements of the Am29540 FFT Address Sequencer.



IMOX and Chip-Pak are trademarks of Advanced Micro Devices, Inc.

Am29540 Programmable FFT Address Sequencer

### **DISTINCTIVE CHARACTERISTICS**

- · Generates data and coefficient addresses
- Programmable transform length 2 to 65,536 points
- Radix-2 or Radix-4
- Decimation in frequency (DIF) or decimation in time (DIT) FFT algorithms supported
- In-place or non-in-place transformation
- 40-pin DIP package
- 5 volt single supply

### **GENERAL DESCRIPTION**

The Am29540 Fast Fourier Transform Address Sequencer generates all the data (RAM) and coefficient (ROM) addresses necessary to perform the repetitive butterfly operations of the FFT. Decimation in time and decimation in frequency algorithms are supported (control DIT/DIF) in radix-2 or radix-4 (RADIX 4/2). A radix-2 real valued input (RVI) transform is also supported. For radix-2 operation the transform length is programmable in powers of 2 from 2 to 65,536 points. In radix-4 the range is 4 to 65,536 in powers of 4.

Address sequences can be selected to be compatible with data which may or may not have been pre-scrambled ("bit-reversed"). If the data has been pre-scrambled the control PSD must be LOW to select the correct sequence. If the data is not pre-scrambled and an in-place transform is performed, the output data will necessarily be in bit-reversed order. If this is not desirable, alternate addresses are available for a non-in-place, non-bit-reversing algorithm.

The butterfly counter operates on the positive clock edge and responds to four instructions. COUNT causes the counter to increment to the next butterfly. RESET causes the counter to initialize for the specified transform length. RESET/LOAD causes the counter to initialize and a data address offset to be loaded into the part via the bi-directional 3-state AD-DRESS port. This offset is effectively OR-ed onto the higher significance bits of the address which are unused for the selected transform length. A HOLD instruction is also provided. Three status lines are provided. EVEN/ODD (KNZ/ KZ) controls the alternation of read and write memories for non-in-place transforms and determines the butterfly structure in the RVI transform. The flag has the function KNZ/KZ when RVI data addresses are selected (AS = 12 to 15). Iteration complete (IT COMP) flags the bottom of a "column" of butterflies and is used in conjunction with block floating point schemes. FFT COMP identifies the last butterfly of the transform



### Am29323

32-Bit Parallel Multiplier

### ADVANCED INFORMATION

### DISTINCTIVE CHARACTERISTICS

- 32-Bit Three-Bus Architecture
  - The device has two 32-bit input ports and one 32-bit output port with maximum multiply time of 80ns
  - Single Clock with Register Enables - The Am29323 is controlled by one clock with individual register enables
- Supports Multiprecision Multiplication

   The device has dual 32-bit registers on each data input port to perform multiprecision multiplication
- Registers can be made transparent
  - Input and output registers can be made transparent independently to eliminate unwanted pipeline delay
- Supports Two's Complement, Unsigned or Mixed Numbers
- Data Integrity Through Master-Slave Mode and Parity Check/Generate
  - Parity check/generate catches inter-device connection errors and master/slave mode provides complete function check

### **GENERAL DESCRIPTION**

The Am29323 is a high-speed 32 x 32-Bit Parallel Multiplier with 67-Bit Accumulator. The part is designed to maximize system level performance by providing a 32-bit three bus architecture and a single clock with register enables.

The Am29323 further enhances the system throughput by providing individual register feedthrough controls, byte parity checking on both input ports and generation on the output port, and dual input registers on each data input bus to support multiprecision multiplication. The Am29323 can manage a wide variety of data types, including two's complement, unsigned, or mixed mode input formats. A 64 x 64-bit multiplication can be performed in seven clock cycles, including input and output. Additional features provided are a format adjust control allowing for standard output or left shifted output suitable for fractional two's complement arithmetic, rounding, and master/slave operation.

The Am29323 is designed with the IMOX<sup>\*</sup> process, which allows internal ECL circuits with TTL-compatible I/O. The device is housed in a 168-lead pin-grid-array package.



\*IMOX is a trademark of Advanced Micro Devices, Inc.

This document contains information on a product under development at Advanced Micro Devices, Inc. The information is intended to help you to evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice.



### DISTINCTIVE CHARACTERISTICS

- Single VLSI device performs high-speed floating-point arithmetic
  - Floating-point addition, subtraction and multiplication in a single clock cycle
  - Internal architecture supports sum-of-products, Newton-Raphson division
- 32-bit, 3-bus flow-through architecture
  - Programmable I/O allows interface to 32- and 16-bit systems
- IEEE and DEC formats
  - Performs conversions between formats
- Six flags indicate operation status
- · Register enables eliminate clock skew
- Input and output registers can be made transparent independently

### **GENERAL DESCRIPTION**

The Am29325 is a high-speed floating-point processor unit. It performs 32-bit single-precision floating-point addition, subtraction, and multiplication operations in a single LSI integrated circuit, using the format specified by the proposed IEEE floating-point standard P754. The DEC singleprecision floating-point format is also supported. Operations for conversion between 32-bit integer format and floatingpoint format are available, as are operations for converting between the IEEE and DEC floating-point formats. Any operation can be performed in a single clock cycle. Six flags – invalid operation, inexact result, zero, not-a-number, overflow, and underflow – monitor the status of operations.

The Am29325 has a 3-bus, 32-bit architecture, with two input buses and one output bus. This configuration provides

high I/O bandwidth, allows access to all buses and affords a high degree of flexibility when connecting this device in a system. All buses are registered, with each register having a clock enable. Input and output registers may be made transparent independently. Two other I/O configurations, a 32-bit, 2-bus architecture and a 16-bit, 3-bus architecture, are user-selectable, easing interface with a wide variety of systems. Thirty-two-bit internal feedforward data paths support accumulation operations, including sum-of-products and Newton-Raphson division.

Fabricated with the high-speed IMOX<sup>™</sup> bipolar process, the Am29325 is powered by a single 5-volt supply. The device is housed in a 144-pin pin-grid-array package.



Am29331 16-Bit Microprogram Sequencer ADVANCED INFORMATION

### DISTINCTIVE CHARACTERISTICS

### • 16-Bits Address Up to 64K Words

Supports 80–90ns microcycle time for a 32-bit high performance system when used with the other members of the Am29300 Family.

• Real Time Interrupt Support

Micro-TRAP and Interrupts are handled transparently at any microinstruction boundary.

• Built-In Conditional Test Logic

Generates inequality evaluation branch conditions from four ALU status bits. Has eight external tests plus a polarity input.

### Break-Point Logic

Built-in address comparator allows break-points in the microcode for debugging and statistics collection.

### Master/Slave Error Checking

Two sequencers can operate in parallel as a Master and a Slave. The Slave generates a fault flag for unequal results.

• 33-Level Stack

Provides support for interrupts, loops and subroutine nesting. It can be accessed through the D-bus to support diagnostics.

### GENERAL DESCRIPTION

The Am29331 is a 16-bit wide high-speed single chip sequencer designed to control the execution sequence of microinstruction stored in the microprogram memory. The instruction set is designed to resemble high-level language constructs, thereby bringing high-level language programming to the micro level.

The Am29331 is interruptible at any microinstruction boundary to support real-time interrupts. Interrupts are handled transparently to the microprogrammer as an unexpected procedure call. Traps are also handled transparently at any microinstruction boundary. This feature allows re-execution of a prior microinstruction. Two separate buses are provided to bring a branch address directly into the chip from two sources to avoid slow turn-on and turn-off times for different sources connected to the data input bus. Four sets of multiway inputs are also provided to avoid slow turn-on and turn-off times for different branch address sources. This feature allows implementation of table look-up or use of external conditions as part of a branch address. The thirtythree deep stack provides the ability to support interrupts, loops and subroutine nesting. The stack can be read through the D-bus to support diagnostics or to implement multitasking at the micro-architecture level. The master/slave mode provides a complete function check capability for the device.

The Am29331 is designed with the IMOX<sup>™</sup> process which allows internal ECL circuits with TTL-compatible I/O. It is housed in a 120-lead pin-grid-array package.



This document contains information on a product under development at Advanced Micro Devices, Inc. The information is intended to help you to evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice. Order # 05729B

## Am29332

32-Bit Arithmetic Logic Unit

### ADVANCED INFORMATION

### **DISTINCTIVE CHARACTERISTICS**

- Single Chip, 32-Bit ALU Supports 80–90ns microcycle time for the 32-bit data path. It is a combinatorial ALU with equal cycle time for all instructions.
- Flow-through Architecture
   A combinatorial ALU with two input data ports and
   one output data port allows implementation of either
   parallel or pipelined architectures.
- 64-Bit In, 32-Bit Out Funnel Shifter This unique functional block allows n-bit shift-up, shift-down, 32-bit barrel shift or 32-bit field extract.
- Supports All Data Types
   It supports one-, two-, three- and four-byte data for
   all operations and variable-length fields for logical
   operations.
- Multiply and Divide Support Built-in hardware to support two-bit-at-a-time modified Booth's algorithm and one-bit-at-a-time division algorithm.
- Extensive Error Checking Parity check and generate provides data transmission check and master/slave mode provides complete function checking.

### GENERAL DESCRIPTION

The Am29332 is a 32-bit wide non-cascadable Arithmetic Logic Unit (ALU) with integration of functions that normally don't cascade, such as barrel shifters, priority encoders and mask generators. Two input data ports and one output data port provide flow-through architecture and allow the designer to implement his/her architecture with any degree of pipelining and no built-in penalties for branching. Also, the simplicity of a three-bus ALU allows easy implementation of parallel or reconfigurable architectures. The register file is off-chip to allow unlimited expansion and regular addressability.

The Am29332 supports one-, two-, three- and four-byte data for arithmetic and logic operations. It also supports

multiprecision arithmetic and shift operations. For logical operations, it can support variable-length fields up to 32 bits. When fewer than four bytes are selected, unselected bits are passed to the destination without modification. The device also supports two-bit-at-a-time modified Booth's algorithm for high-speed multiplication and one-bit-at-a-time division. Both signed and unsigned integers for all byte aligned data types mentioned above are supported.

The Am29332 is designed to support 80-90 ns microcycle time. The device is packaged in a 168-lead pin-grid-array package.



This document contains information on a product under development at Advanced Micro Devices, Inc. The information is intended to help you to evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice.

## Am29334

Four-Port, Dual-Access Register File

### ADVANCED INFORMATION

### DISTINCTIVE CHARACTERISTICS

#### Fast

With an access time of 20ns, the Am29334 supports 80-90ns microcycle time when used with the Am29300 Family for 32-bit systems.

 64 x 18 Bits Wide Register File The Am29334 is a high-performance, high-speed, dual-access RAM with two READ ports and two WRITE ports.

#### Cascadable The Am29334 is cascadable to support either wider word widths, deeper register files, or both.

• Simplified Timing Control

Control for write enable timing and for on-chip read/write multiplexer are derived from a single-phase clock input.

- Byte Parity Storage Width of 18 bits facilitates byte parity storage for each port and provides consistency with the Am29332 32-bit ALU.
- Byte Write Capability Individual byte-write enables allows byte or full word write.

### **GENERAL DESCRIPTION**

The Am29334 is a 64-word deep and 18-bit wide dualaccess register file designed to support other members of the Am29300 Family by providing high-speed storage. It has two write and two read ports for data and four 6-bit address ports. Two address ports are associated with each pair of read and write data ports, one to read data and the other to write. The device is capable of performing two reads and two writes in one cycle. The 18-bit wide register file allows storage of byte parity to support parity check and generate in the Am29332 32-bit ALU. Independent control for each read and write data port allows the Am29334 to be used as a high-speed shared memory or as a mailbox for a multiprocessor system. The device is designed with an access time of 20ns. It is housed in a 120 lead-pin-gridarray package.



This document contains information on a product under development at Advanced Micro Devices, Inc. The information is intended to help you to evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice.

# Am2910A

Microprogram Controller

#### **DISTINCTIVE CHARACTERISTICS**

### Twelve Bits Wide Addresses up to 4096 words of microcode with one chip. All internal elements are a full 12 bits wide. Internal Loop Counter

- Internal Loop Counter
   Pre-settable 12-bit down-counter for repeating instructions and counting loop iterations.
- Four Address Sources Microprogram Address may be selected from microprogram counter, branch address bus, 9-level push/pop stack, or internal holding register.
- Sixteen Powerful Microinstructions Executes 16 sequence control instructions, most of which are conditional on external condition input, state of internal loop counter, or both.
- Output Enable Controls for Three Branch Address Sources

Built-in decoder function to enable external devices onto branch address bus. Eliminates external decoder.

Fast

The Am2910A supports 100ns cycle times and is 25 - 30% faster than the Am2910.

### **GENERAL DESCRIPTION**

The Am2910A Microprogram controller is an address sequencer intended for controlling the sequence of execution of microinstructions stored in microprogram memory. Besides the capability of sequential access, it provides conditional branching to any microinstruction within its 4096-microword range. A last-in, first-out stack provides microsubroutine return linkage and looping capability; there are nine levels of nesting of microsubroutines. Microinstruction loop count control is provided with a count capacity of 4096.

During each microinstruction, the microprogram controller provides a 12-bit address from one of four sources: 1) the

microprogram address register ( $\mu$ PC), which usually contains an address one greater than the previous address; 2) an external (direct) input (D); 3) a register/counter (R) retaining data loaded during a previous microinstruction; or 4) a nine-deep last-in, first-out stack (F).

The Am2910A is a speed improved plug-in replacement of the Am2910 featuring AMD's ion-implanted micro-oxide (IMOX) processing and offering 25 – 30% speed improvement. The Am2910A also features a nine-word deep stack versus the five-deep stack of the Am2910.



# Am29C10A

CMOS Microprogram Controller

### PRELIMINARY

#### DISTINCTIVE CHARACTERISTICS

- Low power ٠ The CMOS Am29C10A supports 125 ns cycle times at
  - 20% the power of the equivalent bipolar Am2910A. Twelve bits wide
  - Addresses up to 4096 words of microcode with one chip. All internal elements are a full 12 bits wide.
- Internal loop counter Pre-settable 12-bit down-counter for repeating instructions and counting loop iterations.
- Four address sources

Microprogram address may be selected from microprogram counter, branch address bus, 9-level push/pop stack, or internal holding register.

- Sixteen powerful microinstructions Executes 16 sequence control instructions, most of which are conditional on external condition input, state of internal loop counter, or both.
- Output Enable controls three branch-address sources

Built-in decoder function to enable external devices onto branch address bus. Eliminates external decoder.

### GENERAL DESCRIPTION

The Am29C10A Microprogram Controller is an address sequencer intended for controlling the sequence of execution of microinstructions stored in microprogram memory. Besides the capability of sequential access, it provides conditional branching to any microinstruction within its 4096-microword range. A last-in, first-out stack provides microsubroutine return linkage and looping capability; there are nine levels of nesting of microsubroutines. Microinstruction loop count control is provided with a count capacity of 4096.

**BLOCK DIAGRAM** 

During each microinstruction, the Microprogram Controller provides a 12-bit address from one of four sources: 1) the Microprogram Address Counter/Register (µPC), which usually contains an address one greater than the previous address: 2) an external (Direct) input (D); 3) a Register/ counter (R) retaining data loaded during a previous microinstruction: or 4) a nine-deep last-in. first-out stack/File (F).

The Am29C10A is a CMOS plug-in replacement of the Am2910A. The Am29C10A-10 is a 10 MHz version and the Am29C10A-20 is a 20 MHz version.



### **RELATED PRODUCTS**

| Part No.                                  | Description                      |  |  |  |  |  |  |
|-------------------------------------------|----------------------------------|--|--|--|--|--|--|
| Am29C101                                  | 16-Bit CMOS Microprocessor Slice |  |  |  |  |  |  |
| Am2914                                    | Vectored Interrupt Controller    |  |  |  |  |  |  |
| Am2918                                    | Pipeline Register                |  |  |  |  |  |  |
| Am2922                                    | Condition Code MUX               |  |  |  |  |  |  |
| Am25LS377                                 | Status Register                  |  |  |  |  |  |  |
| Am27S35                                   | Registered PROM                  |  |  |  |  |  |  |
| Am29818 SSR Diagnostics/Pipeline Register |                                  |  |  |  |  |  |  |

# Am29112

### A High-Performance 8-Bit Slice Microprogram Sequencer

ADVANCED INFORMATION

### DISTINCTIVE CHARACTERISTICS

#### • Expandable

8-bit Slice, cascadable up to 16-bits

- Deep stack A 33 deep on-chip stack is used for subroutine linkage, interrupt handling and loop control.
- Hold feature
   A hold pin facilitates multiple sequencer implementations.
- Interruptible at the microprogram level
   Two kinds of interrupts: maskable and unmaskable.
- Powerful loop control When cascaded, two counters can act as a single 16-bit counter or two independent 8-bit counters.
- Powerful addressing modes
   Features direct, multiway, multiway relative and program counter relative addressing.

### **GENERAL DESCRIPTION**

The Am29112 is a high performance interruptible microprogram sequencer intended for use in very high speed microprogrammed machines and optimized for the new state-of-the-art ALU's and other processing components.

The Am29112 is designed to operate in 10MHz microprogrammed systems.

It has an instruction set featuring relative and multiway branching, a rich variety of looping constructs, and provision for loading and unloading the on-chip stack.



#### **APPLICATION NOTES REFERENCE**

- Microprogrammed CPU using Am29116
- An intelligent fast disk controller
- Am29116 architecture speeds pixel manipulation in interactive bit-mapped graphics



### BLOCK DIAGRAM

# Am29C116

16-Bit CMOS Microprocessor

### PRELIMINARY DISTINCTIVE CHARACTERISTICS

control.

• Powerful Field Insertion/Extraction and

Rotate-and-Merge, Rotate-and-Compare and bit-

manipulation instructions provided for complex bit

Issue Date: April 1986

**Bit-Manipulation Instructions** 

Very Low Power

bipolar Am29116.

The CMOS Am29C116 supports 125-ns microcycle

times and requires less than 1/4 the power of the

• Pin-Compatible and Functionally Equivalent to the



## Am29PL141

Fuse Programmable Controller (FPC)

### PRELIMINARY

### DISTINCTIVE CHARACTERISTICS

- Implements complex fuse programmable state ma-• chines
- 64 words of 32-bit-wide microprogram memory Serial Shadow Register (SSR<sup>TM</sup>) diagnostics on chip (programmable option)
- 20 MHz clock rate, 28-pin DIP

- 29 high-level microinstructions
  - Conditional branching
  - Conditional looping
  - Conditional subroutine call
  - Multiway branch
- 16 outputs, 7 conditional inputs

### **GENERAL DESCRIPTION**

The Am29PL141 is a single-chip Fuse Programmable Controller (FPC) which allows implementation of complex state machines and controllers by programming the appropriate sequence of microinstructions. A repertoire of jumps, loops, and subroutine calls, which can be conditionally executed based on the test inputs, provides the designer with powerful control flow primitives.

The Am29PL141 FPC also allows distribution of intelligent control throughout the system. It off-loads the central controller by distributing FPCs as the control for various

self-contained functional units, such as register file/ALU. I/O, interrupt, diagnostic, and bus control units.

A microprogram address sequencer is the heart of the FPC. It provides the microprogram address to an internal 64word by 32-bit PROM. The fuse programming algorithm is almost identical to that used for AMD's Programmable Array Logic family.

As an option, the Am29PL141 may be programmed to have on chip SSR diagnostics capability. Microinstructions can be serially shifted in, executed, and the results shifted out to facilitate system diagnostics.



SSR is a trademark of Advanced Micro Devices. Inc.

## APPENDIX 1 PAL EQUATION

| DEVICE DSP_PAL_U72 | 2 (pa  | 122V10)                              |   |
|--------------------|--------|--------------------------------------|---|
| PIN                |        |                                      |   |
|                    | 1<br>2 | /BUSY = 22<br>/BHEN = 21             |   |
|                    | 2<br>3 | HACK = 20                            |   |
| /BYTEH =           | 4      | /BPRO = 19                           |   |
|                    | 5      | /BREQ = 18                           |   |
|                    | 6<br>7 | /CDRI = 17<br>/CDRR = 16             |   |
|                    | 8      | /CDLI = 15                           |   |
|                    |        | /CDLR = 14;                          |   |
| BEGIN              |        |                                      |   |
| BEGIN              |        |                                      |   |
| BREQ               | : =    | HREQ;                                |   |
| /HACK              | : =    | /BUSY;                               |   |
| BUSY               | : =    | BREQ * BPRN * /BSY +<br>BREQ * BUSY; | • |
| END;               |        |                                      |   |
| BEGIN              |        |                                      |   |
| BPRO               | =      | /HREQ * BPRN;                        |   |
| BHEN               | Ξ      | B115V •                              |   |
| DILA               | -      | BUSY;                                |   |
| CDLR               | =      | /Q * HREQ * /FLY0 +                  | - |
|                    |        | /Q * BYTEH * /FLY0 +                 | • |
|                    |        | /Q * BYTEL * /FLY0;                  |   |
| CDLI               | =      | /Q * HREQ * FLY0 +                   | - |
|                    |        | /Q * BYTEH * FLYO +                  | • |
|                    |        | /Q * BYTEL * FLY0;                   |   |
| CDRR               | =      | Q * HREQ * /FLYO +                   |   |
|                    |        | Q * BYTEH * /FLYO +                  | • |
|                    |        | Q * BYTEL * /FLY0;                   |   |
| CDRI               | =      | Q * HREQ * FLYO +                    |   |
| •••                | -      | Q * BYTEH * FLYO +                   |   |
| END.               |        | Q * BYTEL * FLYO;                    |   |
| END;<br>END.       |        |                                      |   |
|                    |        |                                      |   |

DEVICE DSP\_PAL\_U73 (pal22V10)

PIN

| ΓIN   |        | = 2 /WE7 = 20<br>= 3 /WE6 = 19<br>= 4 /WE5 = 18<br>= 5 /WE4 = 17<br>= 6 /WE3 = 16<br>= 7 /WE2 = 15<br>= 8 /WE1 = 14<br>= 9 /DMAH = 22 |             |
|-------|--------|---------------------------------------------------------------------------------------------------------------------------------------|-------------|
| BEGIN | WE1 =  | Q * /WER<br>/Q * HREQ * IOW * /FLYO<br>/Q * /FLYO * IOW * BYTEH;                                                                      | +<br>+      |
|       | WE2 =  | Q * /WER<br>/Q * HREQ * IOW * /FLYO<br>/Q * /FLYO * IOW * BYTEH * BHEN<br>/Q * IOW * /FLYO * BYTEL;                                   | +++         |
|       | WE3 =  | Q * /WEI<br>/Q * HREQ * IOW * FLYO<br>/Q * FLYO *IOW * BYTEH;                                                                         | +<br>+      |
|       | WE4 =  | Q * /WEI<br>/Q * HREQ * IOW * FLYO<br>/Q * FLYO * IOW * BYTEH * BHEN<br>/Q * IOW * FLYO * BYTEL;                                      | +<br>+<br>+ |
|       | WE5 =  | /Q * /WER<br>Q * HREQ * IOW * /FLYO<br>Q * /FLYO * IOW * BYTEH;                                                                       | +<br>+      |
|       | WE6 =  | /Q * /WER<br>Q * HREQ * IOW * /FLYO<br>Q * /FLYO * IOW * BYTEH * BHEN<br>Q * IOW * /FLYO * BYTEL;                                     | +<br>+<br>+ |
|       | WE7 =  | /Q * /WEI<br>Q * HREQ * IOW * FLYO<br>Q * FLYO * IOW * BYTEH;                                                                         | +<br>+      |
|       | WE8 =  | /Q * /WEI<br>Q * HREQ * IOW * FLYO<br>Q * FLYO * IOW * BYTEH * BHEN<br>Q * IOW * FLYO * BYTEL;                                        | +<br>+<br>+ |
|       | DMAH = | IOW * HREQ<br>IOR * HREQ;                                                                                                             | +           |

END.

### PIN

| PIN   | (700                                                               |   |                                          |                                                                               |                                                                      |                                                                                                 |               |
|-------|--------------------------------------------------------------------|---|------------------------------------------|-------------------------------------------------------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|---------------|
|       | /IOR<br>/EOUT<br>/AO<br>/A1<br>/A2<br>/A4<br>/IOW<br>HREQ<br>/BHEN |   | 2<br>3<br>4<br>5<br>6<br>7<br>8          | /BYTEH<br>/BYTEL<br>/INIT<br>/LOADL<br>/LOADH<br>CD3<br>CD2<br>/POLL<br>INSTR | = 23<br>= 22<br>= 21<br>= 20<br>= 19<br>= 18<br>= 17<br>= 16<br>= 15 |                                                                                                 |               |
|       | /HINIT                                                             | = | 10                                       | /ADDRL                                                                        |                                                                      |                                                                                                 |               |
| BEGIN | /ADSTB<br>/INSTR                                                   | = | 11;<br>A4<br>/EOUJ                       | ſ                                                                             |                                                                      |                                                                                                 | +             |
|       |                                                                    |   | /IOW<br>/A2<br>/A1<br>/A0;               |                                                                               |                                                                      |                                                                                                 | +<br>+<br>+   |
|       | INIT                                                               | = | HINI1<br>/A4 *                           |                                                                               | IOW *                                                                | A2 * A1 * /A0;                                                                                  | +             |
|       | ADDRL                                                              | = | /A4 *                                    | EOUT *                                                                        | IOW *                                                                | A2 * /A1 * A0;                                                                                  |               |
|       | LOADH                                                              | = | /A4 *                                    | • EOUT *                                                                      | IOW *                                                                | A2 * /A1 * /A0;                                                                                 |               |
|       | LOADL                                                              | = | /A4 *<br>/A4 *                           | EOUT *<br>EOUT *                                                              | IOW *<br>IOW *                                                       | /A2 * A1 * A0<br>BHEN * A2 * /A1 * /A0;                                                         | +             |
|       | BYTEH                                                              | = | /д4 *<br>/д4 *                           | EOUT *<br>EOUT *                                                              | IOW *<br>IOR *                                                       | /A2 * A1 * /A0<br>/A2 * A1 * /A0;                                                               | +             |
|       | BYTEL                                                              | = | /A4 *<br>/A4 *                           | EOUT *<br>EOUT *                                                              | IOW *<br>IOR *                                                       | /A2 * /A1 * A0<br>/A2 * /A1 * A0;                                                               | +             |
|       | POLL                                                               | = | /A4 *                                    | EOUT *                                                                        | IOR *                                                                | A2 * A1 * A0;                                                                                   |               |
|       | /CD2                                                               | = | /A4 *                                    | EOUT *                                                                        | IOR *                                                                | /BHEN * /A2 * A1 * /AO<br>/BHEN * /A2 * A1 * /AO<br>/BHEN * A2 * /A1 * /AO;                     | +<br>+        |
|       | /CD3                                                               | = | /A4 *<br>/A4 *<br>/A4 *<br>/A4 *<br>A4 * | EOUT *<br>EOUT *                                                              | IOR *<br>IOW *<br>IOR *<br>IOW *<br>IOW<br>IOR                       | AO<br>/A2 * /A1 * AO<br>BHEN * /A2 * A1 * /AO<br>BHEN * /A2 * A1 * /AO<br>BHEN * A2 * /A1 * /AO | + + + + + + + |

END.

### APPENDIX 2 AmDSP DIGITAL SIGNAL PROCESSOR DEFINITION FILE

WORD 128

|                                 |        | ==<br>1<br>2     | = =<br>1<br>2              |                  |                  |                       |                       |                       |                       |                       |                  |                  |                       |                       |                       |                 |                  | Ů              |                  | Û           | 0                |                 |                  | Ů                |                  | ΰ                |                  |                  | = = =<br>9       | 9                | -                |
|---------------------------------|--------|------------------|----------------------------|------------------|------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|------------------|------------------|-----------------------|-----------------------|-----------------------|-----------------|------------------|----------------|------------------|-------------|------------------|-----------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| ; <u>~</u><br>; S<br>; R        | Š<br>R | R<br>A<br>L<br>U | 4<br>R<br>A<br>L<br>U<br>2 | R<br>A<br>L<br>U | R<br>A<br>L<br>U | R<br>R<br>O<br>P      | R<br>R<br>R<br>O<br>P | R<br>R<br>R<br>O<br>P | R<br>S<br>O<br>P      | R<br>S<br>O<br>P      | R<br>S<br>O<br>P | R<br>R<br>A<br>1 | R<br>R<br>A<br>1      | R<br>R<br>A<br>2      | R<br>R<br>A<br>2      | <br>R<br>A<br>3 | R<br>R<br>A<br>3 | R<br>B<br>1    | R<br>R<br>B<br>1 | R<br>B<br>2 | R<br>R<br>B<br>2 | <br>R<br>B<br>3 | R<br>R<br>B<br>3 | R<br>M<br>I<br>O | R<br>M<br>I<br>O | R<br>M<br>I<br>O | R<br>M<br>I<br>O | R<br>D<br>I<br>O | R                | R<br>D<br>I<br>O | R<br>W           |
| ;;                              |        | -                |                            |                  |                  |                       |                       |                       |                       |                       |                  |                  |                       |                       | AI                    |                 |                  |                |                  |             |                  |                 |                  | 5                |                  | ·                |                  |                  |                  |                  |                  |
| ; 9<br>; 5                      | 9<br>4 | 9<br>3           | = =<br>9<br>2              | 9<br>1           | 9<br>0           | 8<br>9                | 8<br>8                | 8<br>7                | 8<br>6                | 8<br>5                | 8<br>4           | 8<br>3           | 8<br>2                | 8<br>1                | 8<br>0                | 7<br>9          | 7<br>8           | 7<br>7         | 7<br>6           | 7<br>5      | 7<br>4           | 7<br>3          | 7<br>2           | 7<br>1           | 7<br>ს           | 6<br>9           | 6<br>8           | 6<br>7           | 6<br>6           | 6<br>5           |                  |
| ; S<br>; I                      | S<br>I | I<br>A<br>L<br>U | I<br>A<br>L<br>U<br>2      | I<br>A<br>L<br>U | I<br>A<br>L<br>U | I<br>R<br>O<br>P      | I<br>R<br>O<br>P      | I<br>R<br>O<br>P      | I<br>S<br>O<br>P      | I<br>S<br>O<br>P      | I<br>S<br>O<br>P | I<br>A<br>1      | I<br>A<br>1           | I<br>A<br>2           | I<br>A<br>2           | I<br>A<br>3     | I<br>A<br>3      | I<br>B<br>1    | I<br>B<br>1      | I<br>B<br>2 | I<br>B<br>2      | I<br>B<br>3     | I<br>B<br>3      | I<br>M<br>I<br>O | I<br>M<br>I<br>O | I<br>M<br>I<br>O | I<br>M<br>I<br>O | I<br>D<br>I<br>O | I<br>D<br>I      | I<br>D<br>I<br>O | I<br>W<br>E      |
| ;<br>;<br>; =                   | = =    | = =              | = = 1                      | = = :            | = = :            | = = :                 | . = :                 | :                     | . = :                 | :                     |                  |                  |                       | -                     |                       |                 |                  | A L I<br>= = : |                  | . = :       | . = :            | . = :           | . = :            | . = :            | = = :            | . = :            | . = :            | . = :            | . = :            | . = :            | :=               |
| ; 3                             | 2      | 1                | 6<br>0                     | 9                | 8                | 7                     | 6                     | 5                     | 4                     | 3                     | 2                | 1                | Û                     | 9                     | 8                     | 7               | 6                | 5              | 4                | 3           | 2                | 1               | Û                | 9                | 8                | 7                | 6                | 5                |                  | 3                | 2                |
| ; D                             | R      | F<br>A           | M<br>S                     | 0                | 2                | X<br>S<br>E<br>L      | X<br>S                | E<br>N<br>X           | S<br>I<br>N<br>C      | C<br>F<br>A           | 1<br>1<br>6<br>0 | 5<br>4<br>0<br>0 | A<br>G<br>1           | A<br>G<br>1           | A<br>G<br>1           | A<br>G<br>1     | A<br>G<br>1      | A<br>G<br>1    | A<br>G           | A<br>G<br>1 | A<br>G<br>1      | A<br>G          | A<br>G           | A<br>G           | A<br>G           | A<br>G           | A<br>G           | A<br>G           | A                | A<br>G           | A<br>G           |
| ;;                              |        |                  | = = :                      |                  |                  | tip                   | oli<br>==:            |                       |                       |                       |                  |                  |                       |                       |                       |                 |                  |                |                  |             |                  |                 |                  |                  |                  | ato              |                  |                  | . = =            |                  |                  |
| ; 3<br>; 1                      | 3<br>0 | 2<br>9           | 2<br>8                     | 2<br>7           | 2<br>6           | 2<br>5                | 2<br>4                | 2<br>3                | 2<br>2                | 2<br>1                | 2                | 1                | 1<br>8                | 1<br>7                | 1<br>6                | 1<br>5          | 1<br>4           | 3              | 1<br>2           | 1<br>1      | 1<br>0           | 9               |                  |                  |                  |                  |                  |                  |                  |                  |                  |
| ; B<br>; U<br>; F<br>; C<br>; D |        | <u>بم</u> د مدر  | × × .                      | 5<br>2<br>0<br>I | 5<br>2<br>0<br>I | 5<br>2<br>0<br>S<br>1 | 5<br>2<br>0<br>S      | I<br>N<br>T<br>R      | C<br>K<br>S<br>E<br>L | C<br>K<br>S<br>E<br>L | C<br>P<br>O<br>L | C<br>S<br>E      | C<br>C<br>S<br>E<br>L | C<br>C<br>S<br>E<br>L | C<br>C<br>S<br>E<br>L | B<br>R<br>1     | B<br>R<br>1      | B<br>R<br>9    | R<br>8           | B<br>R<br>7 | B<br>R<br>6      | B<br>R<br>5     | R<br>4           | R<br>3           | R<br>2           | R<br>1           | R<br>U           | Ι                | 1<br>U<br>I<br>2 |                  | 1<br>0<br>I<br>U |
| ;<br>; =<br>EJE<br>;            |        |                  | = = :                      | = = :            | = = :            | = = :                 |                       |                       |                       |                       |                  |                  |                       |                       |                       | : = :           | = = :            |                |                  |             |                  | -               |                  |                  | = = :            | = = :            | : = :            | . = :            | :=:              | : = :            | :=               |

; D 5 5 5 5 5 5 ; Ι 4 444 4 4 Т 4 P ; 0 0 0 0 0 0 1 S Ι 1 Ι S S S S ; F 2 D 1 Û 3 2 1 Û ; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ; 1 1 ; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 6 6 6 6 6 ; 6 6 6 6 6 6 6 6 6 6 ; SDI Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι Ι RL Ε 1 1 1 1 1 1 0 ΰ 0 0 0 Û 0 ΰ 0 ΰ ; EEN 1 0 9 8 7 6 5 ; 5 4 3 2 43210 ; ; AG18 ⊷00 EJECT ; : ; Am29501 INSTRUCTION SET ; ; ; ; Am29501 ALU FUNCTIONS (R FUNCTION S) ; ; CONDITIONAL CARRY OUTPUT ; ; ADD.CC: EQU H#0 SUBS.CC: EQU H#4 MOVE.CC: EQU H#8 SUBR.CC: H#C EQU ; ; INHIBIT CARRY OUTPUT ; ADD.NC: EQU H#1 SUBS.NC: EQU H#5 MOVE.NC: EQU H#9 SUBR.NC: EQU H#D ; FORCE CARRY OUTPUT ; ADD.FC: EQU H#2 SUBS.FC: EQU H#6 MOVE.FC: EQU H#A SUBR.FC: EQU H#E ; ;LOGICAL (NO CARRY USED TO DISTINGUISH FROM 29116 LOGICALS) : XOR.NC: EQU H#3 AND.NC: EQU H#7 INV.NC: EQU H∦B OR.NC: H**#**F EQU ; Am29501 SOURCE OPERANDS R. S ; ; A 1: EQU Q#U

| A2:<br>A3:<br>B1:<br>B2:<br>B3:<br>SIGN.EXT:<br>ZERO:<br>MSP:<br>LSP:<br>EJECT<br>;<br>PORT SELEC                                                                        | EQU<br>EQU<br>EQU<br>EQU<br>EQU<br>EQU<br>EQU<br>EQU             | Q#1<br>Q#2<br>Q#3<br>Q#4<br>Q#5<br>Q#6<br>Q#6<br>Q#7                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| ;<br>M.EQ.A1:<br>M.EQ.A2:<br>M.EQ.A3:<br>M.EQ.B1:<br>M.EQ.B2:<br>M.EQ.B3:<br>M.EQ.AU:<br>M.EQ.AU:<br>M.EQ.DI:<br>MIO.IN:                                                 | E QU<br>E QU<br>E QU<br>E QU<br>E QU<br>E QU<br>E QU<br>E QU     | H#0<br>H#1<br>H#2<br>H#3<br>H#4<br>H#5<br>H#5<br>H#6<br>H#7<br>H#8                   |
| ;<br>D.EQ.A2:<br>D.EQ.A3:<br>D.EQ.B2:<br>D.EQ.B3:<br>DIO.IN:<br>;<br>Am29501 Re;<br>A1.EQ.MP:                                                                            | EQU<br>EQU<br>EQU<br>EQU<br>EQU<br>egister Ope:<br>EQU           | Q#0<br>Q#1<br>Q#2<br>Q#3<br>Q#4<br>rations<br>B#00                                   |
| A 1. EQ. DI:<br>A 1. EQ. B3:<br>A 1. HOLD:<br>A 2. EQ. LP:<br>A 2. EQ. AU:<br>A 2. EQ. AU:<br>A 2. EQ. A1:<br>A 2. HOLD:<br>A 3. EQ. MP:<br>A 3. EQ. AU:<br>A 3. EQ. A2: |                                                                  | B#01<br>B#10<br>B#11<br>B#00<br>B#01<br>B#10<br>B#11<br>B#00<br>B#01<br>B#10         |
| A3. HOLD:<br>B1.EQ.MP:<br>B1.EQ.DI:<br>B1.EQ.A3:<br>B1.HOLD:<br>B2.EQ.LP:<br>B2.EQ.AU:<br>B2.EQ.B1:<br>B2.HOLD:<br>B3.EQ.MP:<br>B3.EQ.AU:<br>B3.EQ.B2:<br>B3.HOLD:       | EQU<br>EQUU<br>EQUU<br>EQUU<br>EQUU<br>EQUU<br>EQU<br>EQU<br>EQU | B#11<br>B#00<br>B#01<br>B#10<br>B#11<br>B#00<br>B#01<br>B#10<br>B#01<br>B#10<br>B#11 |

EJECT :REAL ALU INSTRUCTIONS R.ADD: DEF 2X, ADD.CC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.97X 2X, SUBS.CC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11. R.SUBS: DEF 2VB#11,4VH#8,3VQ#4,97X R.MOVE: DEF 2X, MOVE.NC, 3VQ#0, 3X, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.97X 2X.SUBR.CC.3VQ#0.3VQ#0.2VB#11.2VB#11.2VB#11.2VB#11.2VB#11. R.SUBR: DEF 2VB#11,4VH#8,3VQ#4,97X R.XOR: DEF 2X, XOR.NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.97X R.AND: DEF 2X.AND.NC.3VQ#0.3VQ#0.2VB#11.2VB#11.2VB#11.2VB#11.2VB#11.2VB#11. 2VB#11.4VH#8.3VQ#4.97X 2X. INV. NC. 3VQ#0. 3VQ#0. 2VB#11. 2VB#11. 2VB#11. 2VB#11. 2VB#11. R.INV: DEF 2VB#11.4VH#8.3VQ#4.97X 2X, OR.NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, R.OR: DEF 2VB#11.4VH#8.3VQ#4.97X ; IMAGINARY ALU INSTRUCTIONS I.ADD: DEF 34X, ADD.CC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.65X 34X, SUBS.CC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11. I.SUBS: DEF 2VB#11,4VH#8,3VQ#4,65X I.MOVE: DEF 34X,MOVE.NC,3VQ#0,3X,2VB#11,2VB#11,2VB#11,2VB#11,2VB#11,2VB#11, 2VB#11,4VH#8,3VQ#4,65X I.SUBR: DEF 34X,SUBR.CC,3VQ#0,3VQ#0,2VB#11,2VB#11,2VB#11,2VB#11,2VB#11,2VB#11, 2VB#11.4VH#8.3VQ#4.65X I.XOR: DEF 34X, XOR.NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.65X I.AND: DEF 34X, AND.NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11.4VH#8.3VQ#4.65X I.INV: DEF 34X, INV. NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11,4VH#8,3VQ#4,65X 34X, OR.NC, 3VQ#0, 3VQ#0, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, 2VB#11, I.OR: DEF 2VB#11.4VH#8.3VQ#4.65X EJECT ; \* \* \* \* \* \* \* \* \* \* \* × : Am29517 MULTIPLIER : \* \* \* \* \* \* \* \* \* \* × \* \* \* \* × \* \* MX. RALU: EQU H#O :MULTIPLIER IS REAL ALU MIO MX.HOLD: EQU H#2 **;USE PREVIOUS MULTIPLIER** MX. IMAG: EQU H#4 :MULTIPLIER IS IMAG MEM ;MULTIPLIER IS REAL MEM MX.REAL: EQU H#8 ;MULTIPLIER IS COEF ROM MX.CONST: EQU H∦C MX.COS: H∦C :MULTIPLIER IS FFT COS ROM EQU MX.SIN: EQU H#D :MULTIPLIER IS FFT SIN ROM MXY.2C: :MULTIPLIERS ARE TWO'S COMP EQU B#1 ;UNSIGNED MAGNITUDE EQU MXY.2C\* MXY.MAG:

ENABLE LSP ON Y

MY.OE:

EQU

B#0

MY.IN: EQU MY.OE\* ENABLE Y INPUT MP.FRAC: EQU B#∪ FRACTIONAL MULTIPLY (FA) MP.FRAC\* MP.INTG: **;INTEGER MULTIPLY** EQU B#1 ; ROUNDED MSP MP. ROUND: EQU **:**TRUNCATED MSP MP.ROUND\* MP.TRUNC: EQU ;MOST SIGNIFICANT PRODUCT OUT MP.MSP: EQU B#0 MP.MSP\* :LEAST SIGNIFICANT PRODUCT OUT MP.LSP: EQU MP.OE: EQU B#0 **:OUTPUT ENABLE** : ; MULTIPLIER INSTRUCTIONS :MOST SIGNIFICANT PRODUCT MSPROD: DEF 65X,1VB#1,1VB#0,MP.MSP ,1VB#1,1VB#1,4VH#2 1 ,22X,1VB#1,31X 1 LSPROD: DEF 65X, 1VB#1, 1VB#0, MP.LSP :LEAST SIGNIFICANT PRODUCT ,1VB#1,1VB#1,4VH#2 1 ,22X,1VB#1,31X 1 : \* \* \* \* \* \* \* \* \* \* \* \* \* : Am25S10 SHIFTER : \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* : NO.SHIFT: :LSB CONNECTED TO I(-3) EQU B#11 SHIFT.R1: EQU B#10 B#01 SHIFT.R2: EQU SHIFT.R3: EQU B#00 SHIFT.OE: B#0 EQU EJECT : \* \* \* \* \* \* \* \* \* \* × ¥ \* \* \* \* ¥ \* \* \* \* \* \* \* \* \* \* \* \* ; : : INDEX TO Am29116 INSTRUCTIONS - [i] REFERS TO ALLOWED MNEMONICS GROUP ; ; SINGLE OPERAND [1], [2], [3], [4] ; [5], [6], [7], [8] TWO OPERAND ; SHIFT [9], [10], [11] ; [12], [13], [14] ROTATE ; BIT -ORIENTED [15], [16], [17] ; ROTATE & MERGE [18] ; ROTATE & COMPR [19] ; [20], [21], [22], [23], [24], [25] PRIORITIZE ; CYCLIC REDUNDANCY CHECKS ; NOOP ; STATUS [26], [27] ; TEST STATUS [CT] ; ; ; GENERAL MNEMONICS : : ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; BYTE - WORD MODE SELECT [M] ; :

| ;<br>B:<br>W:<br>;                   | E Q U<br>E Q U    | 1B#0<br>1B#1     | ; BYTE MODE<br>; WORD MODE    |
|--------------------------------------|-------------------|------------------|-------------------------------|
| ; 32 RAM REGIST                      |                   |                  |                               |
| ,                                    | ************      | ******           | * * * * * * * * * * * * * * * |
| ;<br>R 0 :                           | EQU               | 5D#0%%           | ; 00000                       |
| R 1:                                 | EQU               | 5D#0%<br>5D#1%   | ; 00000                       |
| R 2:                                 | EQU               | 5D#2%            | ,<br>,                        |
| R3:                                  | EQU               | 5D#3%            | ,                             |
| R4:                                  | EQU               | 5D#4%            | ;                             |
| R5:                                  | EQU               | 5,D#5%           | •                             |
| R6:                                  | EQU               | 5D#6%%           | ;                             |
| R7:                                  | EQU               | 5D#7%            | ;                             |
| R8:                                  | EQU               | 5D#8%            | ;                             |
| R9:                                  | EQU               | 5D#9%            | ;                             |
| R 10:<br>R 11:                       | EQU               | 5D#10%           | ;                             |
| R 12:                                | EQU<br>EQU        | 5D#11%<br>5D#12% | •                             |
| R 13:                                | EQU               | 5D#12%<br>5D#13% | ;                             |
| R 14:                                | EQU               | 5D#14%           | ;                             |
| R 15:                                | EQU               | 5D#15%           | 5                             |
| R16:                                 | EQU               | 5D#16%           | •                             |
| R17:                                 | EQU               | 5D#17%/          | •                             |
| R18:                                 | EQU               | 5D#18%           | ;                             |
| R 19:                                | EQU               | 5D#19%           | ;                             |
| R20:                                 | EQU               | 5D#20%           | ;                             |
| R21:                                 | EQU               | 5D#21%           | ;                             |
| R22:<br>R23:                         | EQU<br>EQU        | 5D#22%<br>5D#23% | ;                             |
| R24:                                 | EQU               | 5D#23%<br>5D#24% | ,<br>,<br>,                   |
| R 25:                                | EQU               | 5D#25%           | ;                             |
| R 26:                                | EQU               | 5D#26%           | ,                             |
| R 27:                                | EQU               | 5D#27%           | *                             |
| R28:                                 | EQU               | 5D#28%           | ;                             |
| R29:                                 | EQU               | 5D#29%           | ;                             |
| R 30:                                | EQU               | 5D#30%           | ;                             |
| R31:                                 | EQU               | 5D#31%           | ;                             |
| EJECT                                |                   |                  |                               |
| ; * * * * * * * *<br>; Am29116 CONTF |                   | * *              |                               |
|                                      | ar Microprocessor |                  |                               |
| ; * * * * * * * *                    |                   |                  |                               |
| ;                                    |                   |                  |                               |
| OEYEN:                               | EQU               | B#∪              | ; Y BUS ENABLE                |
| OEYDIS:                              | EQU               | B#1              |                               |
| ;                                    |                   |                  |                               |
| DLE.EN:                              | EQU               | B#1              | ; DATA LATCH ENABLE           |
| DLE.DIS:                             | EQU               | B#0              |                               |
| ;<br>OETEN:                          | FOU               | D#1              | : T BUS ENABLE                |
| OETDIS:                              | EQU<br>EQU        | B#1<br>B#∪       | ; T BUS ENABLE                |
| OBIDIO.                              | 1. M.O.           | 010              |                               |

SRE.EN: EQU SRE.DIS: FOU B#0 : STATUS REGISTER ENABLE B#1 : IEN: EQU B#0 ; INSTRUCTION ENABLE IDIS: EQU B#1 ; : : EJECT : SINGLE OPERAND INSTRUCTIONS \*\*\*\*\*\*\*\*\* : : OPCODES [1] : MOVE: EQU H#C H#D ; COMPLEMENT H#E ; INCREMENT H#F ; NEGATE EQU COMP: EQU INC: NEG: EQU : SOURCE-DESTINATION SELECT [2] ; RAM ACC ; RAM Y BUS ; RAM STATUS ; ACC RAM SORA: H#O EQU EQU Η#2 SORY: EQU Η#3 SORS: EQU SOAR: H#4 SODR: EQU H#6 ; D RAM ; I ; 0 Η#7 RAM SOIR: EQU H#8 SOZR: EQU RAM ; D(UE) RAM ; D(SE) RAM ; RAM RAM SOZER: EQU H#9 SOSER: EQU H∦A SORR: EQU H#B : : 74X,1VB#1,B#01,3VB#010, ;SINGLE OPERAND RAM 1V, B#10,4V, 4V, 5V%,4X,2VB#11,2VB#11,24X SOR: DEF 1 ; MODE, QUAD, OPCODE, SOURCE - DEST, REGISTER ; [M] [1] [2] [R] ; ; SOURCE (R/S) [3] H#4 ; ACC H#6 ; D SOA: EQU SOD: EQU SOI: EQU H#7 ; I EQU H#8 H#8 ; 0 H#9 ; D(UE) H#A ; D(SE) SOZ: SOZE: EQU SOSE: EQU DESTINATION [4]

: ; Y BUS NRY: EQU D#0 ; ACC EQU D#1 NRA: ; STATUS NRS: EQU D#4 NRAS: EQU D#5 ; ACC, STATUS ; ; SONR: DEF 74X.1VB#1.B#01.3VB#010. : SINGLE OPERAND NON⊷RAM 1 1V, B#11,4V, 4V, 5V%, 4X,2VB#11,2VB#11,24X ; MODE, QUAD, OPCODE, SOURCE, DESTINATION ; LM] [1] [3] [4] : \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; EJECT : TWO OPERAND INSTRUCTIONS \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; ; OPCODES [5] ; ; SUBR: EQU H#O ; S minus R EQU : S minus R with carry SUBRC: H#1 SUBS: EQU ; R minus S H#2 ; R minus S with carry SUBSC: EQU H#3 ; R plus S ADD: EQU H#4 R plus S with carry ADDC: EQU Η#5 ; R . S AND: EQU H#6 ; R . S NAND: EQU H#7 ; R EXOR: EQU H#8 S ; R + S NOR: EQU H#9 ; R + S OR: EQU H∦A R S EXNOR: H∦B EQU ; : : SOURCE=DESTINATION [6] : R S DEST ; RAM ACC ; RAM I TORAA: EQU Η#Ο ACC ACC TORIA: EQU H#2 ACC ; D TODRA: EQU H#3 RAM ACC ; RAM TORAY: EQU H#8 ACC Y BUS Y BUS TORIY: EQU H∦A : RAM I ; D RAM Y BUS TODRY: EQU H∦B ; RAM ACC TORAR: EQU H∦C RAM ; RAM I TORIR: EQU H∦E RAM ; D RAM TODRR: EQU H#F RAM ; TOR1: DEF 74X.1VB#1.B#01.3VB#010. : TWO OPERAND RAM (1) 4V. 5V%, 4X, 2VB#11, 2VB#11, 24X 1 1V, B#00,4V, ; MODE, QUAD, SOURCE → DEST, OPCODE, REGISTER ; I M I [6] [5] [R] ; \*\*\*\*\*\*\*\*\*\*\* ; ; : THE [1] IN THE COMMENT BELOW THE VARIABLE-FIELD REFERS TO THE ALLOWED

: MNEMONIC GROUP. EXAMPLE: MODE REFERS VIA [M] TO THE BYTE-WORD SELECT. : EXAMPLE: THE ALLOWED OPCODE SUBSTITUTIONS IN TOR1 COME FROM GROUP [5] : WHILE THE ALLOWED SOURCE - DESTINATIONS COME FROM GROUP [6]. EJECT : : : SOURCE-DESTINATION [7] R S DEST : EOU H#1 ; D ACC RAM TODAR: EQU RAM TOAIR: H#2 : ACC I TODIR: EOU H#5 ; D Ι RAM : : 74X.1VB#1.B#01.3VB#010. : TWO OPERAND RAM (2) TOR2: DEF 1V, B#10,4V. 4v. 5V%, 4X, 2VB#11, 2VB#11, 24X 1 ; MODE, QUAD, SOURCE - DEST, OPCODE, REGISTER ; : [M] [7] [5] [R] \*\*\*\*\*\*\*\* ; ; SOURCE [8] R S ; ; TODA: EQU H#1 ACC ; D TOAT: EQU H#2 ; ACC Τ ; D TODI: EOU H#5 Ι \*\*\*\*\*\*\* ; TWO OPERAND NON-RAM TONR: DEF 74X, 1VB#1, B#01, 3VB#010, 4V, 1 1V, B#11,4V, 5V%, 4X, 2VB#11, 2VB#11, 24X ; MODE, QUAD, SOURCE, OPCODE, DESTINATION ; [M] [8] [5] [4] ; \*\*\*\*\* \*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\* : EJECT : SHIFT INSTRUCTIONS : ; ; ; DIRECTION AND INPUT [9] : ; UP O SHUPZ: EQU H#0 SHUP1: EQU H#1 : UP 1 : UP QLINK SHUPL: EQU Η#2 ; DOWN O SHDNZ: EQU H#4 SHDN1: EQU H#5 ; DOWN 1 SHDNL: EQU H#6 ; DOWN QLINK : DOWN QC SHDNC: EQU H#7 : DOWN QN QOVR SHDNOV: EQU H#8 ; : : SOURCE [10] SHRR: EQU H#6 RAM ; RAM ; D SHDR: EQU Η#7 RAM ;

; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* : 74X, 1VB#1, B#01, 3VB#010, SHFTR: DEF : SHIFT RAM άv. / 1V. B#10.4V. 5V%. 4X.2VB#11.2VB#11.24X ; MODE, QUAD, SOURCE, DIRECT-INPT, REGISTER ; [M] [10] [9] [R] ; ; ; ; SOURCE [11] ; : ; ACC SHA: EQU H#6 SHD: EQU Η#7 ; D ; ; ; 74X, 1VB#1, B#01, 3VB#010, ; SHIFT NON-RAM SHFTNR: DEF 1 1V, B#11,4V, 4V, 5V%, 4X,2VB#11,2VB#11,24X ; MODE.QUAD.SOURCE.DIRECT-INP.DESTINATION ; 1 [11] [9] [4](NRY: NRA ONLY) ГМЛ \*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\* \*\*\*\*\* \*\*\*\*\*\*\*\* : : EJECT **:**ROTATE INSTRUCTIONS \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* : SOURCE-DESTINATION [12] RTRA: EQU H#C ; RAM ACC EQU ; RAM RTRY: H#E Y BUS RTRR: EQU H#F : RAM RAM ; ; : ; ROTATE RAM (1) 74X,1VB#1,B#01,3VB#010, ROTR1: DEF 1V. B#00.4V.4V. 5V%. 4X.2VB#11.2VB#11.24X 1 ; MODE, QUAD, N. SOURCE - DEST. REGISTER ; [M] [N] [12] [R] ; \*\*\*\*\*\*\*\*\*\*\*\* ; ; SOURCE-DESTINATION [13] ; : ; ACC RTAR: EQU H#0 RAM ; D **RTDR:** EQU H#1 RAM ; ; : ROTR2: DEF 74X, 1VB#1, B#01, 3VB#010, : ROTATE RAM (2) 1V. B#01.4V.4V. 5V%. 4X.2VB#11.2VB#11.24X 1 ; MODE, QUAD, N, SOURCE → DEST, REGISTER ;

[M] [N] [13] [R] ; \*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* : : ; SOURCE DESTINATION 114] D#24 ; D Y BUS D#25 ; D ACC RTDY: EQU RTDA: EQU : ACC Y BUS EQU RTAY: D#28 ; ACC ACC RTAA: EQU D#29 \*\*\*\*\*\*\* 74X, 1VB#1, B#01, 3VB#010, ; ROTATE NON⇔RAM ROTNR: DEF 1V, B#11, 4V, H#C, 5V%, 4X, 2VB#11, 2VB#11, 24X 1 : MODE, QUAD, N, FIXED CODE, DESTINATION ; [M] [N] [14] ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\* ; EJECT : \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\* : BIT ORIENTED INSTRUCTIONS : ; OPCODES [15] : H#D ; SET RAM, BIT N SETNR: EQU H#E ; RESET RAM, BIT N H#F ; TEST RAM, BIT N ; RESET RAM, BIT N KSTNR: EQU TSTNR: EQU 74X, 1VB#1, B#01, 3VB#010, ; BIT ORIENTED RAM (1) BOR1: DEF 1V, B#11, 4V, 4V, 5V%, 4X, 2VB#11, 2VB#11, 24X 1 ; MODE, QUAD, N. OPCODE, REGISTER ; [M] [N] L15] [R] ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\* ; ; : ; UPCODES [16] : LD2NR: EQU H#C ; 2<sup>^</sup>N →→→ RAM EQU H♯D ; 2"N --- RAM LDC2NR: ; RAM + 2"N - RAM EQU H#E A2NR: : RAM - 2"N - RAM S2NR: EQU H#F : BOR2: DEF ; BIT ORIENTED RAM (2) 74X,1VB#1,B#01,3VB#010, 1 1V, B#10,4V,4V, 5V%, 4X,2VB#11,2VB#11,24X ; MODE, QUAD, N, OPCODE, REGISTER ; [M] [N] [16] [R] ; EJECT

```
;
;
;
;
 OPCODES [17]
:
TSTNA:
             EQU
                          D#0
                                ; TEST ACC, BIT N
RSTNA:
             EQU
                          D#1
                                ; RESET ACC, BIT N
SETNA:
             EQU
                          D#2
                                ; SET ACC, BIT N
A2NA:
             EQU
                          D#4
                                : ACC + 2"N -- ACC
                                 : ACC - 2"N -ACC
S2NA:
             EQU
                          D#5
LD2NA:
                                : 2"N -- ACC
             EQU
                          H#6
                                : 2"N --- ACC
LDC2NA:
             EQU
                          D#7
TSTND:
                                 : TEST D. BIT N
             EQU
                          D#16
                          D#17
                                 ; RESET D, BIT N
RSTND:
             EQU
                                 : SET D. BIT N
SETND:
             EQU
                          D#18
A2NDY:
                          D#20
                                ; D + 2"N = Y BUS
             EQU
                                D - 2"N - Y BUS
S2NDY:
                          D#21
             EQU
                                 : 2"N ⊷∽ Y BUS
LD2NY:
                          D#22
             EQU
LDC2NY:
                                 : 2"N --- Y BUS
             EQU
                          D#23
;
:
 :
BONR: DEF
             74X,1VB#1,B#01,3VB#010,
                                        : BIT ORIENTED NON-RAM
             1V, B#11,4V,B#1100, 5V%, 4X,2VB#11,2VB#11,24X
1
;
             MODE, QUAD, N, FIXED CODE, OPCODE
;
             [M]
                  [N]
                                 [17]
:
    **************
                  ***
:
EJECT
;
 ROTATE AND MERGE
:
 ;
:
 SOURCE=DEST SELECT [U,S,MASK=DEST] [18]
;
;
                                   KOT
                                       NON→ROT MASK→DEST
MDAT:
             EQU
                          H#7
                                 ; D
                                       ACC
                                              I
MDAR:
             EQU
                                 ; D
                                       ACC
                          H#8
                                              RAM
MDRI:
             EQU
                          H#9
                                 ; D
                                       RAM
                                              Ι
                                 ; D
MDRA:
             EQU
                          H#A
                                        RAM
                                              ACC
                                 ; ACC
MARI:
             EQU
                          H#Ċ
                                       RAM
                                              Т
MRAI:
                                 ; RAM
             EQU
                          H#E
                                       ACC
                                              Ι
;
;
 :
             74X, 1VB#1, B#01, 3VB#010,
ROTM: DEF
                                        :ROTATE AND MERGE
1
             1V, B#01,4V,4V,
                                 5V%, 4X, 2VB#11, 2VB#11, 24X
;
             MODE, QUAD, N, SOURCE - DEST, REGISTER
;
             [M] [N] [18]
                                [R]
;
 **************
;
;
;
```

;

; ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; ROTATE AND COMPARE ; \*\*\*\*\* ; ; ROT.SRC(U) → NON ROT.SRC(S)/DEST → MASK(S)[19] : ; CDAI: EQU H#2 : D ACC Т CDRI: EQU H#3 ; D RAM Т CDRA: EQU H#4 ; D RAM ACC CRAI: EQU H#5 : RAM ACC Ι ; ; ; ; ROTATE AND COMPARE 74X, 1VB#1, B#01, 3VB#010, ROTC: DEF 5V%, 4X,2VB#11,2VB#11,24X 1 1V, B#01,4V,4V, ; MODE, QUAD, N. SOURCE-DEST-MASK, REGISTER ; [M] [N] [19] [R] ; \*\*\*\*\* \*\*\*\*\*\* \*\*\*\*\*\* \*\*\* ; EJECT ; ; ; PRIORITIZE ; ; ; SOURCE L20] ; ACC PRT1A: EQU H#7 PR1D: EQU H#9 ; D : ; ; DESTINATION [21] PR1A: EOU H#8 : ACC PR1Y: EQU H∦A Y BUS PR1R: EQU H#B ; RAM : 74X.1VB#1,B#01,3VB#010, PRT1: DEF ; RAM ADDR MASK(S) 4V, 5V%, 4X, 2VB#11, 2VB#11, 24X 1 1V, B#10,4V, ; ; MODE, QUAD, DESTINATION, SOURCE, REG-MASK [M] [21] [20] [R] ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; ; ; ; DESTINATION [23] PR2A: EQU H#O ; ACC H#2 ; Y BUS PR2Y: EQU : : MASK (S) [22]

;

H#8 ; ACC H#A ; O PRA: EQU PRZ: EQU PRI: EQU H#B ; I ; ; 74X.1VB#1.B#01.3VB#010. : PRIORITIZE RAM PRT2: DEF 1V, B#10,4V, 4V, 5V%, 4X,2VB#11,2VB#11,24X 1 ; MODE, QUAD, MASK, DEST, REG-SOURCE ; [M] [22][23] [R] : \*\*\*\*\*\*\*\*\*\*\* : EJECT ; ; SOURCE (R) [24] : ; RAM ; ACC PR3R: EQU H#3 EQU PR3A: H#4 PR3D: H#6 ; D EQU : : \*\*\*\*\*\*\*\*\* : 74X,1VB#1,B#01,3VB#010, : PRIORITIZE κΑΜ PRT3: DEF 1V, B#10,4V, 4V, 5V%, 4X,2VB#11,2VB#11,24X 1 ; MODE, QUAD, MASK, SOURCE, REG -DEST ; [M] [22] L24] LRJ : : : : ; SOURCE (R) L25] H#4 ; ACC H#6 ; D PRTA: EQU PRTD: EQU ; : : PRTNR: DEF 74X, 1VB#1, B#01, 3VB#010, ; PRIORITIZE NON-RAM 1 1V, B#11,4V, 4V, 5V%, 4X,2VB#11,2VB#11,24X ; MODE, QUAD, MASK, SOURCE, DESTINATION ; LM] [22] L25] [4](NRY, NRA ONLY) : : EJECT ; : : CYCLIC REDUNDANCY CHECK : : : : CRCF: DEF 74X, 1VB#1, B#01, 3VB#010, 1 B#11001100011.5V%.4X.2VB#11.2VB#11.24X : FORWARD

; ; : \*\*\*\*\* CRCR: DEF 74X, 1VB#1, B#01, 3VB#010, 1 B#11001101001,5V%, 4X,2VB#11,2VB#11,24X ; REVERSE \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\* : : ; ; ; : ; ; NOOP ; NOOP: DEF 74X, 1VB#1, B#01, 3VB#010, H#7140, 4X, 2VB#11, 2VB#11, 24X; NO OPERATION EJECT : STATUS \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; OPCODE [26] ; SET OVR, N, C, Z SONZC: EQU 5D#3%/ EQU ; SET LINK SL: 5D#5% ; SET FLAG 1 EQU SF1: 5D#6% SF2: EQU 5D#9% ; SET FLAG 2 SF3: EQU 5D#10% ; SET FLAG 3 ; : SETST: DEF 74X, 1VB#1, B#01, 3VB#010, 1 B#011, H#BA, 5V%, 4X, 2VB#11, 2VB#11, 24X ; SET STATUS ; OPCODE : [26] : : ; OPCODE [27] RONCZ: EQU D#3% ; RESET OVR,N,C,Z RL: EQU D#5% ; RESET LINK ; RESET FLAG 1 RF1: EQU D#6% RF2: EQU D#9% ; RESET FLAG 2 RF3: EQU D#10%% ; RESET FLAG 3 ; \*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* RSTST: DEF 74X, 1VB#1, B#01, 3VB#010, B#011, H#AA, 5V%, 4X, 2VB#11, 2VB#11.24X ; RESET STATUS 1

; OPCODE ; [27] ; \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \*\*\*\*\*\*\*\*\*\*\*\* ; EJECT : : : 74X, 1VB#1, B#01, 3VB#010, ; SAVE STATUS⊷RAM SVSTR: DEF 1V, B#10, H#7A, 5V%, 4X, 2VB#11, 2VB#11, 24X 1 ; MODE, QUAD, FIXED, RAM ADDRESS/DEST : [ M ] [R] : SAVE STATUS NON⊶RAM SVSTNR: DEF 74X, 1VB#1, B#01, 3VB#010, 1V, B#11, H#7A, 5V%, 4X, 2VB#11, 2VB#11, 24X : MODE, QUAD, FIXED, DESTINATION [4](NRY, NRA ONLY) LM] \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ; ; TEST STATUS : : : ; UPCODE (CT) TNOZ: EQU D#0 : TEST (N OVR) + Z , TNO: EQU ; TEST N OVR D#2 ΤΖ: EQU D#4 ; TEST Z TOVR: EQU D#6 ; TEST OVR TLOW: EQU D#8 ; TEST LOW TC: EQU D#10 ; TEST C ; TEST Z + C TZC: EQU D#12 ; TEST N TN: EQU D#14 TL: EQU D#16 : TEST LINK ; TEST FLAG 1 D#18 **TF1:** EQU ; TEST FLAG 2 TF2: EQU D#20 : TEST FLAG 3 **TF3:** EQU D#22 ; TEST: DEF 74X, 1VB#1, B#01, 3VB#010 1 ,B#011,H#9A,5V%, 4X,2VB#11,2VB#11,24X ; TEST STATUS : \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* IMMED: DEF 74X,1VB#1,B#01010,16V\$,4X,2VB#11,2VB#11,24X EJECT : Am29540 FFT ADDRESS GENERATOR

| ; * * * * * * *        | * * * *    | * * * *      | * * * * * * * * *                                                    |
|------------------------|------------|--------------|----------------------------------------------------------------------|
| ;                      | FOU        | <b>D</b> #00 |                                                                      |
| ADR.HOLD:              | EQU        | B#00<br>D#01 |                                                                      |
| ADR.RST:               | EQU        | B#01         |                                                                      |
| ADR.LOAD:              | EQU        | B#10         |                                                                      |
| ADR.INC:<br>;          | EQU        | B#11         |                                                                      |
| ,<br>RADIX.2:          | EQU        | B#U          |                                                                      |
| RADIX.4:               | EQU        | RADIX.2*     | •                                                                    |
| PSD:                   | EQU        | B#0          | PRESCRAMBLED DATA                                                    |
| NORM.ORD:              | EQU        | PSD*         | NORMAL ORDER                                                         |
| DIT:                   | EQU        | B#1          |                                                                      |
| DIF:                   | EQU        | DIT*         |                                                                      |
| ADR.OE:                | EQU        | B#0          |                                                                      |
| ;                      |            | •• # 4       |                                                                      |
| ADR1:                  | EQU        | H#O          | ;DATA ADDRESS 1 FOR RADIX 2/4                                        |
| ADR2:                  | EQU        | H#1          | ;DATA ADDRESS 2 FOR RADIX 2/4                                        |
| ADR3:                  | EQU        | H#2          | ;DATA ADDRESS 3 FOR RADIX 4                                          |
| ADR4:                  | EQU        | H#3          | ;DATA ADDRESS 4 FOR RADIX 4                                          |
| ALT.ADR1:              | EQU        | H#4          | ALTERNATE ADDRESS 1 FOR RADIX 2/4                                    |
| ALT.ADR2:              | EQU        | H#5          | ALTERNATE ADDRESS 2 FOR RADIX 2/4                                    |
| ALT.ADR3:<br>ALT.ADR4: | EQU<br>EQU | H#6<br>H#7   | ;ALTERNATE ADDRESS 3 FOR RADIX 4<br>;ALTERNATE ADDRESS 4 FOR RADIX 4 |
| CONST1:                | EQU        | H#8          | CONSTANT ADDRESS 1 FOR RADIX 2/4 OR SHADING                          |
| CONST2:                | EQU        | H#9          | CONSTANT ADDRESS 2 FOR RADIX 4                                       |
| CONST3:                | EQU        | H#A          | CONSTANT ADDRESS 3 FOR RADIX 4                                       |
| CONST1.S:              | EQU        | H#B          | ; INVERTED CONSTANT ADDRESS 1 FOR SHADING                            |
| RVI.ADR1:              | EQU        | H#C          | REAL VALUE INPUT ADDRESS 1                                           |
| RVI.ADR2:              | EQU        | H#D          | REAL VALUE INPUT ADDRESS 2                                           |
| RVI.ADR3:              | EQU        | H#E          | REAL VALUE INPUT ADDRESS 3                                           |
| RVI.ADR4:              | EQU        | H#F          | ;REAL VALUE INPUT ADDRESS 4                                          |
| ;                      |            |              |                                                                      |
| • Am20520 PTPEL        | * * * * *  | * * * * *    | * * * * * * * * *                                                    |
| ; Am29520 PIPEL        | INE REGI   | * * * * *    | * * * * * * * *                                                      |
| ;                      |            |              |                                                                      |
| ADP.SHFT:              | EQU        | B#00         | ;SHIFT ADDR THROUGH PIPELINE                                         |
| ADP.LDB:               | EQU        | B#01         | ;LOAD & SHIFT SECOND HALF OF PIPELINE                                |
| ADP.LDA:               | EQU        | B#10         | ;LOAD & SHIFT FIRST HALF OF PIPELINE                                 |
| ADP.HOLD:              | EQU        | B#11         | ;NO OP                                                               |
| ;<br>ADP.A1:           | EQU        | B#11         | REG A1 TO OUTPUT                                                     |
| ADP.A2:                | EQU        | B#10         | REG A2 TO UUTPUT                                                     |
| ADP.B1:                | EQU        | B#01         | REG B1 TO OUTPUT                                                     |
| ADP.B2:                | EQU        | B#00         | REG B2 TO OUTPUT                                                     |
| ;                      |            |              |                                                                      |
| ADP.OE: EQU            | B#0        | ;OUTPUT      | ENABLE                                                               |
| ;<br>; ADDRESS GENER   | ATOD THE   | TRUCTTONS    |                                                                      |
| , ADDRESS GENER        | AION IND   | -INCOLIONS   | ,                                                                    |
| ADG.HOLD: DEF          | 74X,1VB    | #1,B#1000    | )1,3X,1VB#1,1VB#0,1VB#1,ADR.HOLD,4VH#0,8X,                           |
| /                      |            | 2VB#11,24    |                                                                      |
| ADG.RST: DEF           |            |              | )1,3X,1VB#1,1VB#0,1VB#1,ADR.RST,4VH#0,8X,                            |
|                        |            | 2VB#11,24    |                                                                      |
| ADG.LOAD: DEF          | /4X,1VB    | #I,B#1000    | )1,3X,1VB#1,1VB#0,1VB#1,ADR.LOAD,4VH#0,8X,                           |

| ADG.INC<br>/       | • 011      |                 | ,2VB#11,               |                   | 0,1VB#1,ADR.INC,4VH#0,8X,                                         |
|--------------------|------------|-----------------|------------------------|-------------------|-------------------------------------------------------------------|
| EJECT              |            |                 |                        |                   |                                                                   |
| ;                  |            |                 |                        |                   |                                                                   |
| ;                  |            |                 |                        |                   |                                                                   |
| ; • • •            |            |                 |                        |                   |                                                                   |
| • Am 201           | O MICRO    | PROGRAM         | CONTROLLI              | ER INSTRUCTION SE | T 120]                                                            |
| :                  | 0 110110   | , woonnin       | oon incele             |                   |                                                                   |
| ; * * *            | * * *      | * * * *         | * * * * *              | * * * * * * * *   | * * * * *                                                         |
| ;                  |            |                 |                        |                   |                                                                   |
| ;                  |            |                 |                        |                   |                                                                   |
| ; CONDI            | TION CO    | DE MULTI        | PLEXER                 |                   |                                                                   |
| ;<br>TE UTCU       |            | EQU             | B#∪                    | CONDITION PREF    | TV                                                                |
| IF.HIGH<br>IF.LOW: | •          | EQU             | IF.HIGH                | ,                 | 1 X                                                               |
| UNCOND:            |            | EQU             | H#0                    | ;UNCONDITIONAL    | BRANCH                                                            |
| NEW.PRO            | С:         | EQU             | H#1                    | NEW PROCESS CO    |                                                                   |
| LNT.ACK            |            | EQU             | H#2                    | INTERRUPT ACK     |                                                                   |
| FFT.ITC            |            | EQU             | H#3                    | Am29540 ITERAT    |                                                                   |
| FFT.DON            |            | EQU             | H#4                    | Am29540 FFT CO    |                                                                   |
| ADR.CT:            |            | EQU             | H#5                    | ;Am29116 CONDIT   | ION TEST                                                          |
| RALU.S:            |            | EQU             | H#8                    | ;REAL ALU SIGN    |                                                                   |
| RALU.OV            | :          | EQU             | H#9                    | ;REAL ALU OVERF   | LOW                                                               |
| RALU.Z:            |            | EQU             | H#A                    | ;REAL ALU ZERO    |                                                                   |
| RALU.C:            |            | EQU             | H∦B                    | ;REAL ALU CARRY   |                                                                   |
| IALU.S:            |            | EQU             | H#C                    | ;IMAG ALU         |                                                                   |
| LALU.OV            |            | EQU             | H#D                    |                   |                                                                   |
| IALU.Z:<br>IALU.C: |            | EQU<br>EQU      | H#E<br>H#F             |                   |                                                                   |
| ;                  |            | 1.60            |                        |                   |                                                                   |
| JZ:                | DEF        | 124X.H          | # U                    |                   | ; RESET STACK, MICROPC, ADDRESS                                   |
| CJS:               | DEF        | 107X,1          | VB#0,4V,               | 2V\$H#FFF,H#1     | ; COND JUMP SUBROUTINE                                            |
| JSR:               | DEF        | 107X,B          | #U, UNCONI             | ),12V\$H#FFF,H#1  | ; UNCOND JUMP SUBROUTINE                                          |
| JMAP:              | DEF        | 112X <b>,</b> 1 | 2V\$H#FFF              | ,H#2              | ; UNCOND JUMP TO MEMORY MAP (Di)                                  |
| CJP:               | DEF        |                 |                        | 2V\$H#FFF,H#3     | ; COND JUMP PIPELINE                                              |
| JMP:               | DEF        |                 |                        | ),12V\$H#FFF,H#3  | ; UNCOND JUMP PIPELINE                                            |
| PUSH:              | DEF        |                 | VB#0,4V,               |                   | ; PUSH STACK, LOAD REG MAYBE                                      |
| JSRP:              | DEF        |                 | 2V\$H#FFF              |                   | ; JUMP SUB FROM REG (F) OR PIPE(T)                                |
| CJV:<br>JRP:       | DEF<br>DEF |                 | VB#0,4V,<br>2V\$H#FFF. | 12V\$H#FFF,H#6    | ; COND JUMP TO VECTOR INTER (DI)<br>: JUMP TO KEG (F) OR PIPE (T) |
| JRP:<br>KFCT:      | DEF<br>DEF |                 | 2V\$H#FFF<br>2V\$H#FFF |                   | ; DO LOOP REPEAT UNTIL CTR=0-STACK                                |
| RPCT:              | DEF        |                 | 2V\$H#FFF.             |                   | : DO LOOP UNTIL CTR= $0 \approx PIPE$                             |
| CRTN:              | DEF        |                 |                        | 12V\$H#FFF,H#A    | COND RETURN, POP STACK (T)                                        |
| RTN:               | DEF        |                 |                        | D,12V\$H#FFF,H#A  | UNCOND RETURN                                                     |
| CJPP:              | DEF        | 107X,1          | VB#0,4V,               | 12V\$H#FFF,H#B    | ; COND JUMP PIPELINE, POP STACK                                   |
| LDCT:              | DEF        | •               | 2V\$,H#C               |                   | ; LOAD REGISTER, CONTINUE                                         |
| L00P:              | DEF        | 124X,H          |                        |                   | ; DO LOOP UNTIL TEST=T - STACK                                    |
| CONT:              | DEF        | 124X,H          |                        |                   | ; CONTINUE                                                        |
| ΤWΒ:               | DEF        | 107X <b>,</b> 1 | vB#U,4V,               | 12V\$H#FFF,H#F    | ; THREE WAY BRANCH                                                |
| :                  |            |                 |                        |                   |                                                                   |

; : Am2925 CYCLE LENGTH SELECT [21] ; System Clock Generator and Driver : : ; THE FOLLOWING ARE THE CYCLE LENGTH CODES (PRELIM) EXAMPLE CYCLE (1 OF 4) 100ns AT 30MHz 160ns AT 25MHz : 3 CLOCK PERIODS CLA: EQU Q#0 CLB: EQU 0#1 ; 4 ; 5 200ns AT 25MHz CLC: EQU Q#5 200ns AT 30MHz ; 6 CLD: EQU Q#7 280ns AT 25MHz ; 7 CLE: EQU Q#3 320ns AT 25MHz ; 8 CLF: EQU Q#2 CLG: EQU Q#6 ; 9 300ns AT 30MHz ; 10 CLOCK PERIODS CLH: EQU Q#4 322ns AT 31MHz (max crystal frequency is 311MHz) : : OTHER CONTROL LINES FOR THE Am2925 ; INCOMPLETELY DEFINED AT PRESENT (IN THIS FILE) FIRST.25: EQU B#1 ; LAST.25: EQU B#0 ; HALT: EQU B#00 ; NOHALT: EQU B#00 ; SINGLSTP: EQU B#00 ; RUN: EQU B#00 ; WATTREO: EOU B#0 ; NOWAITRO: EQU B#1 ; READY: EQU B#0 ; NOTREADY: EQU B#1 ; INITIALIZE: EOU B#0 ; NO.INIT: EQU B#1 ; : EJECT : : MISCELLANEOUS CONTROLS FOR THE DSP ;MEMORY WRITE ENABLE WE: EQU B#0 NO WRITE NWE: EQU WE\* RD.MEM: DEF 31X, NWE, 31X, NWE, 64X WR.CMPX: DEF 31X, WE, 31X, WE, 64X WR.REAL: DEF 31X, WE, 31X, NWE, 64X WR.IMAG: DEF 31X, NWE, 31X, WE, 64X SEL.116: B#01 EQU SEL.540: B#10 EQU NO.ADDR: EQU B#11

2-19

; BUFCD: EQU B#1 BUFFER CHIP DISABLE BUFEN: BUFCD\* EQU CF.LOAD: EQU B#0 ;ENABLE NEW ROM ADDRESS CF.LOAD\* CF.HOLD: EQU : ; DATA PRESCALING DIV.BY.1: DEF NO.SHIFT, 30X, NO.SHIFT, 94X SHIFT.R1, 30X, SHIFT.R1, 94X DIV.BY.2: DEF DIV.BY.4: DEF SHIFT.R2, 30X, SHIFT.R2, 94X DIV.BY.8: DEF SHIFT.R3, 30X, SHIFT.R3, 94X SP: EQU B#1 SINGLE PRECISION (16 BITS) DP: EQU B#0 ;DOUBLE PRECISION (32 BITS-IMAG:REAL) : INTRRUPT: DEF 104X.B#1.23X :GENERATE EXTERNAL INTERRUPT : ;FIELD POSITIONS MISC: DEF 64X, 1VB#1, 39X, 1VB#0, 2VB#00, 21X NO.OP: DEF NO.SHIFT.MOVE.NC.A1.A1 :DO NOTHING 1 ,A1.HOLD,A2.HOLD,A3.HOLD,B1.HOLD,B2.HOLD,B3.HOLD ,MIO.IN, DIO.IN, NWE 1 1 , NO.SHIFT, MOVE.NC, A1, A1 1 ,A1.HOLD,A2.HOLD,A3.HOLD,B1.HOLD,B2.HOLD,B3.HOLD 1 ,MIO.IN, DIO.IN, NWE ,SP,MP.TRUNC,MP.FRAC,MP.MSP 1 1 ,MY.IN,MXY.2C,MX.CONST ,1X,SEL.540,B#111 1 , 3X, DIT, RADIX. 2, NORM. ORD, ADR. HOLD, ADR 1 1 1 ,8X,ADP.HOLD,ADP.A1,24X : RD.CMD: DEF NO.SHIFT, MOVE.CC, A1, A1 ; READ MODE INTO 540 AND 29116 ,A1. HOLD, A2. HOLD, A3. HOLD, B1. HOLD, B2. HOLD, B3. HOLD 1 ,MIO.IN,DIO.IN,NWE 1 1 ,NO.SHIFT,MOVE.CC,A1,A1 1 ,A1.HOLD,A2.HOLD,A3.HOLD,B1.HOLD,B2.HOLD,B3.HOLD 1 ,MIO.IN,DIO.IN,NWE ,SP,MP.TRUNC,MP.FRAC,MP.MSP 1 1 ,MY.IN,MXY.2C,MX.CONST ,1X, NO. ADDR, B#101 1 1 , 3X, DIT, RADIX. 2, NORM. ORD, ADR. HOLD, ADR 1 1 ,8X,ADP.HOLD,ADP.A1,24X ; MAKE.ONE: DEF NO.SHIFT, MOVE.FC, A1, A1 ;FORCE CARRY INTO IMAG ALU 1 ,A1. HOLD, A2. HOLD, A3. HOLD, B1. HOLD, B2. HOLD, B3. HOLD 1 ,MIO.IN,DIO.IN,NWE 1 ,NO.SHIFT,MOVE.CC,ZERO,A1 ; 0 + CRY = 1,A1.HOLD,A2.EQ.AU,A3.HOLD,B1.HOLD,B2.HOLD,B3.HOLD 1 1 ,MIO.IN,DIO.IN,NWE , DP, MP. TRUNC, MP. FRAC, MP. MSP 1 1 .MY.IN.MXY.2C.MX.CONST ,54X 1 : END

### APPENDIX 3 AmDSP DIGITAL SIGNAL PROCESSOR SOURCE FILE

. . . . . . . . . . . . . . . . . : : ; IDLE LOOP WAITING FOR PROCESS INSTRUCTION START:: NO.OP & JMP \$ + 1 : JUMP INSTR FOR JAMMING ; WAIT FOR INSTR STROBE NO.OP & CJP IF.LOW.NEW.PROC.\$ NO.OP & CJP IF.HIGH.NEW.PROC.\$ : WAIT FOR STROBE TO LATCH DATA RD.CMD & JMAP : VECTOR TO COMMANDED PROCESS SET INTERRUPT FLAG AND WAIT FOR AN ACKNOWLEDGE FINISH: ; WAIT FOR ACKNOWLEDGE INTRRUPT & NO.OP & CONT : NO INTR WHILE DEBUGGING /; & CJP IF.HIGH, INT.ACK, \$ NO.OP & JMP START EJECT \* \* \* \* \* \* \* \* \* \* \* \* . . . . . . . . . . . . . . . : ; FFT PROGRAM ; SIZE HANDLED BY Am29540 FFT:: \*\*\* RESET 29540 AND DO NOTHING ADG.RST & RD.MEM & DIV.BY.2 /& R.MOVE & I.MOVE & MSPROD & MISC /& CONT \*\*\* FILL PIPELINE BEFORE WRITING ADG.HOLD , DIT, RADIX.2, NORM.ORD, ADR2, ADP.LDB /& RD.MEM & DIV.BY.2 /& R.MOVE & I.MOVE & MSPROD & MISC /& CONT \*\*\* READ B OPERAND & COEFFICIENT ADG.HOLD CF.LOAD, DIT, RADIX.2, NORM.ORD, CONST1, ADP.HOLD, ADP.B1 /& RD.MEM & DIV.BY.2 /& R.MOVE , , , , B1.EQ.DI, , , , DIO.IN /& I.MOVE , , , , B1.EQ.DI, , , , DIO.IN /& MSPROD /& MISC /& CONT \*\*\* REAL\*COS ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR1, ADP.LDA /& RD.MEM & DIV.BY.2 /& R.MOVE , , , , B1.HOLD, , M.EQ.B1 /& I.MOVE , , , , B1.HOLD, , , MIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN

/& MISC /& CONT \*\*\* REAL\*SIN, READ A OPERAND ADG.INC CF.HOLD. DIT. RADIX.2, NORM.ORD. . ADP.HOLD. ADP.A1 /& RD.MEM & DIV.BY.2 /& R.MOVE , A1.EQ.DI, , , , , , M.EQ.B1, DIO.IN /& I.MOVE , A1.EQ.DI, , , B1.HOLD, , , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT \*\*\* IMAG\*COS ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR2, ADP.LDB /& RD.MEM & DIV.BY.2 /& R.ADD A1, MSP, A1.HOLD, , A3.EQ.MP, , B2.EQ.AU, , MIO.IN /& I.MOVE , A1.HOLD, , , B1.HOLD, , , M.EQ.B1 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFCD /& MISC /& CONT \*\*\* IMAG\*SIN & READ B ADG.HOLD CF.LOAD, DIT, RADIX.2, NORM.ORD, CONST1, ADP.HOLD, ADP.B1 /& RD.MEM & DIV.BY.2 /& R.MOVE , A1.HOLD, , A3.HOLD, B1.EQ.DI, B2.HOLD, , MIO.IN, DIO.IN /& I.SUBS A1, MSP, A1. HOLD, , A3.EQ.MP, B1.EQ.DI, B2.EQ.AU, , M.EQ.B1, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* REAL\*COS ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR1, ADP.LDA /& RD.MEM & DIV.BY.2 /& R.SUBS A1,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, B2.HOLD, , M.EQ.B1 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, B1.HOLD, B2.EQ.AU, B3.EQ.MP, MIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CONT \*\*\* PASS 1 LOOP CAN DO SHADING IT1.LOOP: \*\*\* REAL\*SIN, READ A ADG.INC CF.HOLD, DIT, RADIX.2, NORM.ORD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.ADD B2,MSP, A1.EQ.DI, A2.HOLD, , , B2.EQ.AU, B3.EQ.MP, M.EQ.B1, DIO.IN /& I.SUBS A1,B3, A1.EQ.DI, A2.EQ.AU, A3.HOLD, B1.HOLD, B2.HOLD, , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT \*\*\* IMAG\*COS, WRITE A - B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR2, ADP.LDB, ADP.B2 /& WR.CMPX /& R.ADD A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, , B2.EQ.AU, B3.HOLD, , D.EQ.B2 /& I.ADD A2,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, , , M.EQ.B1, D.EQ.B2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFCD /& MISC /& CONT \*\*\* IMAG\*SIN, READ B ADG.HOLD CF.LOAD, DIT, RADIX.2, NORM.ORD, CONST1, ADP.HOLD, ADP.B1 /& RD.MEM & DIV.BY.2

/& R.SUBS A2, B3, A1.HOLD, A2.EQ.AU, A3.HOLD, B1.EQ.DI, B2.HOLD, , , DIO.IN /& I.SUBS A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, B1.EQ.DI, B2.EQ.AU, ,M.EQ.B1, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* REAL\*COS. WRITE A + B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR1, ADP.LDA, ADP.A2 /& WR.CMPX /& R.SUBS A1,A3, , A2.EQ.AU, , B1.HOLD, B2.HOLD, , M.EQ.B1, D.EQ.A2 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, B1.HOLD, B2.EQ.AU, B3.EQ.MP, MIO.IN, D.EQ.A2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CJP IF.LOW.FFT.ITC.IT1.LOOP BTF.LOOP: \*\*\* REAL\*SIN, READ A ADG.INC CF.HOLD, DIT, RADIX.2, NORM.ORD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.ADD B2,MSP, A1.EQ.DI, A2.HOLD, , , B2.EQ.AU, B3.EQ.MP, M.EQ.B1, DIO.IN /& I.SUBS A1,B3, A1.EQ.DI, A2.EQ.AU, A3.HOLD, B1.HOLD, B2.HOLD, , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT \*\*\* IMAG\*COS, WRITE A - B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR2, ADP.LDB, ADP.B2 /& WR.CMPX /& R.ADD A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, , B2.EQ.AU, B3.HOLD, , D.EQ.B2 /& I.ADD A2,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, , , M.EQ.B1, D.EQ.B2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFCD /& MISC /& CONT \*\*\* IMAG\*SIN, READ B ADG.HOLD CF.LOAD, DIT, RADIX.2, NORM.ORD, CONST1, ADP.HOLD, ADP.B1 /& RD.MEM & DIV.BY.2 /& R.SUBS A2, B3, A1.HOLD, A2.EQ.AU, A3.HOLD, B1.EQ.DI, B2.HOLD, , , DIO.IN /& I.SUBS A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, B1.EQ.DI, B2.EQ.AU, ,M.EQ.B1, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* REAL\*COS, WRITE A + B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR1, ADP.LDA, ADP.A2 /& WR.CMPX /& R.SUBS A1,A3, , A2.EQ.AU, , B1.HOLD, B2.HOLD, , M.EQ.B1, D.EQ.A2 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, B1.HOLD, B2.EQ.AU, B3.EQ.MP, MIO.IN, D.EQ.A2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CJP IF.LOW,FFT.DONE,BTF.LOOP :/& CJP IF.LOW.FFT.ITC.BTF.LOOP \*\*\* REAL\*SIN, READ A ADG.INC CF.HOLD, DIT, RADIX.2, NORM.ORD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.ADD B2,MSP, A1.EQ.DI, A2.HOLD, , , B2.EQ.AU, B3.EQ.MP, M.EQ.B1, DIO.IN /& I.SUBS A1,B3, A1.EQ.DI, A2.EQ.AU, A3.HOLD, B1.HOLD, B2.HOLD, , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC

/& CONT COUNT PASSES FOR TESTING :/& RPCT BTF.LOOP+1 : FLUSH PIPELINE \*\*\* IMAG\*COS. WRITE A - B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, NORM.ORD, ADR2, ADP.LDB, ADP.B2 /& WR.CMPX /& R.ADD A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, , B2.EQ.AU, B3.HOLD, , D.EQ.B2 /& I.ADD A2,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, , , M.EQ.B1, D.EQ.B2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS. BUFCD /& MISC /& CONT \*\*\* TMAG\*STN ADG.HOLD , DIT, RADIX.2, NORM.ORD, , ADP.HOLD /& RD.MEM & DIV.BY.2 /& R.SUBS A2, B3, A1.HOLD, A2.EQ.AU, A3.HOLD, , B2.HOLD /& I.SUBS A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, , B2.EQ.AU, ,M.EQ.B1 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* WRITE A + B\*w ADG.HOLD , DIT, RADIX.2, NORM.ORD, , ADP.LDA, ADP.A2 /& WR.CMPX /& R.SUBS A1,A3, , A2.EQ.AU, , , B2.HOLD, , , D.EQ.A2 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, , B2.EQ.AU, B3.EQ.MP, , D.EQ.A2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C /& MISC /& CONT ADG.HOLD , DIT, RADIX.2, NORM.ORD, , ADP.HOLD /& RD.MEM & DIV.BY.2 /& R.ADD B2,MSP, , A2.HOLD, , , B2.EQ.AU, B3.EQ.MP /& I.SUBS A1,B3, , A2.EQ.AU, A3.HOLD, , B2.HOLD /& MSPROD & MISC /& CONT \*\*\* WRITE A - B\*w ADG.HOLD , DIT, RADIX.2, NORM.ORD, , , ADP.B2 /& WR.CMPX /& R.MOVE , , , , , , , , D.EQ.B2 /& I.ADD A2,A3, , A2.EQ.AU, , , , , , D.EQ.B2 /& MSPROD & MISC /& CONT ADG.HOLD , DIT, RADIX.2, NORM.ORD /& RD.MEM & DIV.BY.2 /& R.SUBS A2,B3, , A2.EQ.AU /& I.MOVE , , A2.HOLD /& MSPROD & MISC /& CONT \*\*\* WRITE A + B\*w ADG.HOLD , DIT, RADIX.2, NORM.ORD, , , ADP.A2 /& WR.CMPX /& R.MOVE , , , , , , , , , D.EQ.A2 /& I.MOVE , , , , , , , , , D.EQ.A2 /& MSPROD & MISC

/& JMP FINISH EJECT : ; : : INVERSE FFT PROGRAM (PRE SCRAMBLED DATA) : SIZE HANDLED BY Am29540 IFFT:: \*\*\* RESET 29540 AND DO NOTHING ADG.RST & RD.MEM & DIV.BY.2 /& R.MOVE & I.MOVE & MSPROD & MISC /& CONT **\*\*\*** FILL PIPELINE BEFORE WRITING ADG.HOLD , DIT, RADIX.2, PSD, ADR2, ADP.LDB /& RD.MEM & DIV.BY.2 /& R.MOVE & I.MOVE & MSPROD & MISC /& CONT \*\*\* READ B OPERAND & COEFFICIENT ADG.HOLD CF.LOAD. DIT. RADIX.2. PSD. CONST1. ADP.HOLD. ADP.B1 /& RD.MEM & DIV.BY.2 /& R.MOVE , , , , B1.EQ.DI, , , , DIO.IN /& I.MOVE , , , , B1.EQ.DI, , , , DIO.IN /& MSPROD /& MISC /& CONT **\*\*\*** REAL\*COS OF COMPLEX MULTIPLY ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR1, ADP.LDA /& RD.MEM & DIV.BY.2 /& R.MOVE , , , , B1.HOLD, , , M.EQ.B1 /& I.MOVE , , , , B1.HOLD, , , MIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CONT \*\*\* REAL\*SIN, READ A OPERAND ADG.INC CF.HOLD, DIT, RADIX.2, PSD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.MOVE , A1.EQ.DI, , , , , , M.EQ.B1, DIO.IN /& I.MOVE , A1.EQ.DI, , , B1.HOLD, , , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT \*\*\* IMAG\*COS ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR2, ADP.LDB /& RD.MEM & DIV.BY.2 /& R.ADD A1, MSP, A1.HOLD, , A3.EQ.MP, , B2.EQ.AU /& I.MOVE , A1.HOLD, , , B1.HOLD, , , M.EQ.B1 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFCD /& MISC /& CONT \*\*\* IMAG\*SIN, READ B ADG.HOLD CF.LOAD, DIT, RADIX.2, PSD, CONST1, ADP.HOLD, ADP.B1 /& RD.MEM & DIV.BY.2 /& R.MOVE , A1.HOLD, , A3.HOLD, B1.EQ.DI, B2.HOLD, , , DIO.IN /& I.ADD A1, MSP, A1.HOLD, , A3.EQ.MP, B1.EQ.DI, B2.EQ.AU, , M.EQ.B1, DIO.IN

/& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* REAL\*COS ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR1, ADP.LDA /& RD.MEM & DIV.BY.2 /& R.SUBS A1,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, B2.HOLD, , M.EQ.B1 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, B1.HOLD, B2.EQ.AU, B3.EQ.MP, MIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CONT IBTF.LUP: \*\*\* REAL\*SIN. READ A ADG.INC CF.HOLD, DIT, RADIX.2, PSD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.SUBS B2,MSP, A1.EQ.DI, A2.HOLD, , , B2.EQ.AU, B3.EQ.MP, M.EQ.B1, DIO.IN
/& I.SUBS A1,B3, A1.EQ.DI, A2.EQ.AU, A3.HOLD, B1.HOLD, B2.HOLD, , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT \*\*\* IMAG\*COS. WRITE A - B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR2, ADP.LDB, ADP.B2 /& WR.CMPX /& R.ADD A1, MSP, A1. HOLD, A2. HOLD, A3. EQ. MP, , B2. EQ. AU, B3. HOLD, , D. EQ. B2 /& I.SUBS A2,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, , , M.EQ.B1, D.EQ.B2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFCD /& MISC /& CONT \*\*\* IMAG\*SIN, READ B ADG.HOLD CF.LOAD, DIT. RADIX.2, PSD. CONST1. ADP.HOLD. ADP.B1 /& RD.MEM & DIV.BY.2 /& R.ADD A2,B3, A1.HOLD, A2.EQ.AU, A3.HOLD, B1.EQ.DI, B2.HOLD, , , DIO.IN /& I.ADD A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, B1.EQ.DI, B2.EQ.AU, ,M.EQ.B1, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD /& MISC /& CONT \*\*\* REAL\*COS. WRITE A + B\*w ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR1, ADP.LDA, ADP.A2 /& WR.CMPX /& R.SUBS A1,A3, , A2.EQ.AU, , B1.HOLD, B2.HOLD, , M.EQ.B1, D.EQ.A2 /& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, B1.HOLD, B2.EQ.AU, B3.EQ.MP, MIO.IN, D.EQ.A2 /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.COS, BUFEN /& MISC /& CJP IF.LOW,FFT.DONE, IBTF.LUP \*\*\* REAL\*SIN. READ A OPERAND ADG.INC CF.HOLD, DIT, RADIX.2, PSD, , ADP.HOLD, ADP.A1 /& RD.MEM & DIV.BY.2 /& R.SUBS B2,MSP, A1.EQ.DI, A2.HOLD, , , B2.EQ.AU, B3.EQ.MP, M.EQ.B1, DIO.IN /& I.SUBS A1,B3, A1.EQ.DI, A2.EQ.AU, A3.HOLD, B1.HOLD, B2.HOLD, , MIO.IN, DIO.IN /& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFEN /& MISC /& CONT ; FLUSH PIPELINE

```
*** IMAG*COS. WRITE A - B*w
ADG.HOLD CF.HOLD, DIT, RADIX.2, PSD, ADR2, ADP.LDB, ADP.B2
/& WR.CMPX
/& R.ADD A1, MSP, A1. HOLD, A2. HOLD, A3. EQ. MP, , B2. EQ. AU, B3. HOLD, , D. EQ. B2
/& I.SUBS A2,A3, A1.HOLD, A2.EQ.AU, , B1.HOLD, , , M.EQ.B1, D.EQ.B2
/& MSPROD MP.ROUND. MP.FRAC. MY.IN. MXY.2C. MX.COS. BUFCD
/& MISC
/& CONT
                                          *** TMAG*STN
ADG.HOLD , DIT, RADIX.2, PSD, , ADP.HOLD
/& RD.MEM & DIV.BY.2
/& R.ADD A2, B3, A1.HOLD, A2.EQ.AU, A3.HOLD, , B2.HOLD
/& I.ADD A1, MSP, A1.HOLD, A2.HOLD, A3.EQ.MP, , B2.EQ.AU, ,M.EQ.B1
/& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C, MX.SIN, BUFCD
/& MISC
/& CONT
                                          *** WRITE A + B*w
ADG.HOLD , DIT, RADIX.2, PSD, , ADP.LDA, ADP.A2
/& WR.CMPX
/& R.SUBS A1,A3, , A2.EQ.AU, , , B2.HOLD, , , D.EQ.A2
/& I.ADD B2,MSP, A1.HOLD, , A3.HOLD, , B2.EQ.AU, B3.EQ.MP, , D.EQ.A2
/& MSPROD MP.ROUND, MP.FRAC, MY.IN, MXY.2C
/& MISC
/& CONT
ADG.HOLD , DIT, RADIX.2, PSD, , ADP.HOLD
/& RD.MEM & DIV.BY.2
/& R.SUBS B2,MSP, , A2.HOLD, , , B2.EQ.AU, B3.EQ.MP
/& I.SUBS A1, B3, , A2.EQ.AU, A3.HOLD, , B2.HOLD
/& MSPROD & MISC
/& CONT
                                          *** WRITE A - B*w
ADG.HOLD . DIT. RADIX.2. PSD. . . ADP.B2
/& WR.CMPX
            , , , , , , , , D.EQ.B2
/& R.MOVE ,
/& I.SUBS A2,A3, , A2.EQ.AU, , , , , , D.EQ.B2
/& MSPROD & MISC
/& CONT
ADG.HOLD , DIT, RADIX.2, PSD
/& RD.MEM & DIV.BY.2
/& R.ADD A2,B3, , A2.EQ.AU
/& I.MOVE , , A2.HOLD
/& MSPROD & MISC
/& CONT
                                          *** WRITE A + B*w
ADG.HOLD , DIT, RADIX.2, PSD, , , ADP.A2
/& WR.CMPX
/& R.MOVE , , , , , , , , D.EQ.A2
/& I.MOVE , , , , , , , , , D.EQ.A2
/& MSPROD & MISC
/& JMP FINISH
EJECT
:
MXMULT::
```

| NO.OP<br>EJECT | &  | JMP   | STA  | R | Т   |    |    | ; | ΓB | DL |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |
|----------------|----|-------|------|---|-----|----|----|---|----|----|---|---|----|------------|-----|-----|-----|-----|-----|-------|----|----|----|-----|-----|-----|---|---|---|
| ;              |    |       |      |   |     |    |    |   |    |    |   |   |    |            |     | * 1 |     |     |     |       |    |    |    |     |     |     |   |   |   |
| ; * *          |    | * *   | * *  |   | * * | *  | *  | - | *  | *  |   |   | *  | π          | *   | * 1 |     |     |     |       |    |    |    | *   | *   | *   | * | * | * |
| ;              |    |       |      |   |     |    |    |   |    |    |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |
| ;              |    | PROC  | CESS |   | VEC | TO | RS |   |    |    |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |
|                |    | ORG   |      |   | 102 | 4. | -  | 8 |    |    |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |
| ;              |    |       |      |   |     |    |    |   |    |    |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |
| NO.OP          | &  | JMP   | STA  | R | Т   |    |    |   |    |    | : | I | NS | <b>FRU</b> | 101 | 101 | N ( | ) : | - 1 | 100   | Ρ  |    |    |     |     |     |   |   |   |
| NO.OP          | &  | J M P | мхм  | Ш | Ι.Т |    |    |   |    |    | • |   |    |            |     |     |     | 1 : | - 1 | 1 A T | RT | x  | мш | LTI | ΓΡΙ | . Y |   |   |   |
| NO.OP          |    |       |      |   |     |    |    |   |    |    |   |   |    |            |     |     |     | ,   | - F | TL    | TE | R  |    |     |     |     |   |   |   |
|                |    |       |      |   | L N |    |    |   |    |    | , |   |    |            |     |     |     | -   |     |       |    |    |    |     |     |     |   |   |   |
| NO.OP          | č. | JMP   | FFT  |   |     |    |    |   |    |    | ; |   |    |            |     |     |     | 3 : | = 1 | FFT   |    |    |    |     |     |     |   |   |   |
| NO.OP          | &  | JMP   | IFF  | Т |     |    |    |   |    |    | ; |   |    |            |     |     | , i | 4:  | = 3 | E N V | ER | SE | F  | FΤ  |     |     |   |   |   |
|                |    | END   |      |   |     |    |    |   |    |    |   |   |    |            |     |     |     |     |     |       |    |    |    |     |     |     |   |   |   |

## APPENDIX 4 IIR SECOND ORDER FILTER SECTION USING THE Am29510 AND Am29PL141

```
: Microcode tield definitions
;
 WORD 32
;
P31
         - Output enable
: P30-26 - Up code
         - Test polarity
: 25
; P24-22 - Test input select
; P21-16 - Instruction data
         - input data taken flag
 P15
:
 P14
         - input select
:
 P13
         - MAC result select
 P12-11 - Shift register control
;
 P10-09 - Register select
:
: PO8-06 - Coefficient select
: 105
         - MAC add/subtract control
: 204
         - MAC pass/accumulate control
; 203
         - MAC round/truncate control
: 202
         - Output data ready flag
: 101
         - Unused
:
 2004
         - Error flag
:
        . . .
:
: Application Definitions
                         * * *
;
DTACK:
        EQU
                 1B#0
                         : Input data taken ack
                         ; Input select tristate control
INSEL:
        LQU
                 1B#0
MACSEL: EQU
                 18#0
                         ; MAC input select = INSEL*
HOLD:
        EQU
                 28#11
                         ; 29520 instructions
LDA:
        EQU
                 28#10
LDB:
        EQU
                 28#01
A 1:
        EQU
                 28#11
                         ; 29520 register select
A2:
        EQU
                 2B#10
B1:
        EQU
                 28#01
B2:
        EQU
                 28#00
WO:
        EQU
                 32#0
                         : Coefficient select
                 30#1
W1:
        EQU
W2:
        EQU
                 3Q#2
W3:
        EQU
                 3Q#3
W4:
        EQU
                 3Q#4
RND:
        EQU
                 18#0
                         : 29510 round control
TRUNC:
        EQU
                 RND*
ACCUM:
        EQU
                 1B#0
                         : 29510 accumulate control
PASS:
        EQU
                 ACCUM*
SUB:
        EQU
                 1B#0
                         : 29510 subtract control
ADD:
        EQU
                 SUB*
DRDY:
        EQU
                 1B#0
                         ; Output data ready
INRDY:
        EQU
                 3Q#0
                         : Test condition input 10
                         ; Polarity control for ready
        ŁQU
READY:
                 B#0
OUTACK: EOU
                 3Q#1
                         ; Test condition input T1
TAKEN: EQU
                 B#0
                         ; Polarity control for taken
```

ï ; Only need a few instructions for this simple design ; CONTINUE: DEF 6H#2D:,10D#0%, : Continue 1 1V, 1V, 1V, 2V, 2V, 3V, 1V, 1V, 1V, 1V, 1X, 1VB#0 6H#39:, 1V, 3V%:, 6V%:, GO.PL.IF: DEF ; Go to pipeline if 1V, 1V, 1V, 2V, 2V, 3V, 1V, 1V, 1V, 1V, 1X, 1VB#0 1 WAIT.TILL: DEF 6H#3A:,1V,3V%:,6V%:, : Wait for test input 1 1V, 1V, 1V, 2V, 2V, 3V, 1V, 1V, 1V, 1V, 1X, 1VB#0 GO.PL: DEF 6H#39:,1B#1,3Q#6,6V%:, : Go to pipeline 1 1V, 1V, 1V, 2V, 2V, 3V, 1V, 1V, 1V, 1V, 1X, 1VB#0 STOP: ; Error halt DEF 6H#3A:,1B#0,3Q#6,6X, 1 1V, 1V, 1V, 2V, 2V, 3V, 1V, 1V, 1V, 1V, 1X, 1VB#0 ; END

```
; This program implements the equation:
y(n) = W(0)^*x(n) + W(1)^*x(n-1) + W(2)^*x(n-2) + W(3)^*y(n-1) + W(4)^*y(n-2)
; The CC input is grounded for unconditional jumps.
; TO is connected to input data ready
: T1 is connected to output data taken
TRUE:
        EQU
                 B#1
; Keep writing data on input lines until valid data is written
; No ops keep data sampling synchronous
INIT:
WAIT.TILL KEADY, INRDY, $+1.
                  DTACK*, INSEL, MACSEL*, LDA, A1, WU, ADD, PASS, RND, DRDY*
CONTINUE
                  DTACK. INSEL*.MACSEL. HOLD. A1. WU. ADD. PASS. RND. DRDY*
CONTINUE
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WU, ADD, PASS, RND, DRDY*
CONTINUE
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WU, ADD, PASS, RND, DRDY*
CONTINUE
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WU, ADD, PASS, RND, DRDY*
; Error if next data sample not ready
GO.PL.IF KEADY*, INRDY, ERROR,
                  DTACK*, INSEL, MACSEL*, LDA, A2, W1, ADD, PASS, TRUNC, DRDY*
1
: Output W(0) \neq x(n)
CONTINUE
                  DTACK, INSEL*, MACSEL, LDB, A2, W1, ADD, PASS, TRUNC, DRDY
; Error if data not taken
GO.PL.IF TAKEN*.OUTACK.ERROR.
                  DTACK*, INSEL*, MACSEL, HOLD, A2, W1, ADD, PASS, TRUNC, DRDY*
:
 Add W(1) * x(n-1)
CONTINUE
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WO, ADD, ACCUM, TRUNC, DRDY*
:
: Add W(3)*y(n-1)
CONTINUE
                  DTACK*, INSEL*, MACSEL, HOLD, B1, W3, ADD, ACCUM, RND, DRDY*
:
: Do W2*x(n-2), read data sample, error if not ready
FOREVER:
GO.PL.IF KEADY*, INRDY, ERROR,
                  DTACK, INSEL, MACSEL*, LDA, A2, W2, ADD, PASS, TRUNC, DRDY*
; Add W1*x(n-1), output previous filtered sample
CONTINUE
                  DTACK*, INSEL*, MACSEL, LDB, A2, W1, ADD, ACCUM, TRUNC, DRDY
; Add WO*x(n), error if output not taken
GO.PL.IF TAKEN*.OUTACK, ERROR.
1
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WO, ADD, ACCUM, TRUNC, DRDY*
 Add W4*v(n-2)
                  DTACK*.INSEL*.MACSEL.HOLD.B2,W4,ADD,ACCUM,TRUNC,DRDY*
CONTINUE
:
; Add W3*y(n-1), loop indefinitely
GO.PL
         FOREVER
1
                  DTACK*.INSEL*.MACSEL.HOLD.B1.W3.ADD.ACCUM.RND.DRDY*
ERROR:
                  DTACK*, INSEL*, MACSEL, HOLD, A1, WO, ADD, ACCUM, TRUNC, DRDY*, TRUE
STOP
END
```

## APPENDIX 5 IIR SIXTH ORDER FILTER USING THE Am29510 AND Am29PL141

```
; Microcode field definitions
;
 WORD 32
;
         - Output enable
; 131
; P30-26 - Up code
      - Test polarity
; 125
 P24-22 - Test input select
;
:
 P21-16 - Instruction data
; P15
         - input data taken flag
; P14
         - Input select
: 113
         - MAC result select
; P12-09 - Operand address
; P08
         - Operand KAM write enable
 P07-04 - Coefficient select
:
; 203
         - MAC add/subtract control
         - MAC pass/accumulate control
 P02
:
         - MAC round/truncate control
 201
:
 P00
         - Output data ready flag
:
;
      . . . . . .
                         . . .
                                      . . . .
                   . . .
:
;
 Application Definitions
;
:
DTACK:
        EQU
                 1B#0
                         ; Input data taken ack
                         ; Input select tristate control
INSEL:
        LQU
                 18#0
                         ; MAC input select = INSEL*
MACSEL: LOU
                 18#0
XN:
        EQU
                 4H#1
                         ; Next data sample x(n+1)
                         ; Current data sample x(n)
XO:
        EQU
                 4H#O
X1:
        EQU
                 4H#F
                         : Previous input x(n-1)
X2:
                 4H#E
        EQU
                 4H#D
X3:
        EQU
X4:
                 4H#C
        EQU
X5:
        EQU
                 4H#B
X6:
        EQU
                 4H#A
Y0:
        EQU
                 4148
                         : Filtered output y(n)
Y1:
        EQU
                 411#7
                         : Previous output y(n-1)
Y2:
        EQU
                 41#6
Y3:
        EQU
                 41.45
Y4:
        EQU
                 4H#4
¥5:
        EOU
                 41#3
```

Y6:

EQU

41#2

| W0:<br>W1:<br>W2:<br>W3: | EQU<br>EQU<br>EQU<br>EQU | 4H#U<br>4H≰<br>4H#2<br>4H#3 | ;     | Coefficient select                       |
|--------------------------|--------------------------|-----------------------------|-------|------------------------------------------|
| W4:                      | EQU                      | 411#4                       |       |                                          |
| W5:                      | EQU                      | 4H#5                        |       |                                          |
| W6:                      | EQU                      | 41#6                        |       |                                          |
| W7:                      | EQU                      | 4H#7                        |       |                                          |
| W8:                      | EQU                      | 4H#8                        |       |                                          |
| W9:                      | EQU                      | 4H#9                        |       |                                          |
| W10:                     | EQU                      | 4H#A                        |       |                                          |
| W11:                     | EQU                      | 4H#B                        |       |                                          |
| W12:                     | EQU                      | 4H#C                        |       |                                          |
| ZERO:                    | EQU                      | 4H#F                        | ;     | Put in a zero coefficient for NOP        |
| WE:                      | EQU                      | 1B#0                        | ;     | Operand KAM write enable                 |
| ADD:                     | EQU                      | 18#0                        | ;     | 29510 add/subtract control               |
| SUBT:                    | EQU                      | A D D =                     |       |                                          |
| ACCUM:                   | EQU                      | 18#0                        | ;     | 29510 accumulate control                 |
| PASS:                    | EQU                      | ACCUM#                      |       |                                          |
| RND:                     | EQU                      | 18#0                        | ;     | 29510 round control                      |
| TRUNC:                   | EQU                      | R N D 🕈                     |       |                                          |
| DRDY:                    | EQU                      | 18#0                        | ;     | Output data ready                        |
| ;                        |                          |                             |       |                                          |
| INRDY:                   | EQU                      | 3Q#0                        | ;     | Test condition input TO                  |
| READY:                   | EQU                      | B#0                         | ;     | Polarity control for ready               |
| OUTACK:                  | EQU                      | 3Q#1                        | ;     | Test condition input T1                  |
| TAKEN:                   | EQU                      | B#0                         | ;     | Polarity control for taken               |
| ;                        |                          |                             |       |                                          |
| ;                        | _                        |                             | • • • |                                          |
| CONTINU                  |                          | DEF                         |       | #2D:,10D#0%, ; Continue                  |
| /                        |                          |                             | -     | 1V, 1V, 1V, 1V                           |
| LD.CREG                  |                          | DEF                         |       | #24:,1B#1,3Q#6,6V%:, ; Load counter      |
| /                        | 10,10,10                 |                             |       | 1V, 1V, 1V, 1V                           |
| LOOP:                    |                          |                             |       | #28:,4X,6V%:, ; Go to label if C<>0      |
| /                        |                          |                             |       | 1V, 1V, 1V, 1V                           |
| WAIT.TI                  |                          |                             |       | #3A:,1V,3V%:,6V%:, ; Wait for test input |
| /                        | 18,18,18                 | <b>, 4V,</b> 1V, 4          | ŧν,   | 1V, 1V, 1V, 1V                           |
| ;                        |                          |                             |       |                                          |
| END                      |                          |                             |       |                                          |

```
; This program implements the equation:
  y(n) = W(0)^{*}x(n) + ... + W(6)^{*}x(n-6) + W(7)^{*}y(n-1) + ... + W(12)^{*}y(n-6)
  The CC input is grounded for unconditional jumps.
  TO is connected to input data ready
  T1 is connected to output data taken
TRUE:
         EQU
                 B#1
  Make previous operands = 0
;
INIT:
LD.CREG D#15.
                 DTACK, INSEL*, MACSEL*, XN, WE, WU, ADD, PASS, TRUNC, DRDY*
                 DTACK, INSEL*, MACSEL*, XN, WE, WO, ADD, PASS, TRUNC, DRDY*
LOOP $.
FOREVER:
WAIT.TILL KEADY, INRDY, $+1,
                                    : Synchronize to input clock
                 DTACK*.INSEL.MACSEL*.XN.WE.WU.ADD.PASS.TRUNC.DRDY*
 Do W() * x(n)
CONTINUE
                 DTACK. INSEL*, MACSEL, XO, WE*, WU, ADD, PASS, TRUNC, DRDY*
: Add W1*x(n-1)
CONTINUE
                 DTACK*, INSEL*, MACSEL, X1, WE*, W1, ADD, ACCUM, TRUNC, DRDY*
: Add W2*x(n-2)
CONTINUE
                 DTACK*, INSEL*, MACSEL, X2, WE*, W2, ADD, ACCUM, TRUNC, DRDY*
: Add W3*x(n-3)
CONTINUE
                 DTACK*, INSEL*, MACSEL, X3, WE*, W3, ADD, ACCUM, TRUNC, DRDY*
; Add W4*x(n-4)
                 DTACK*, INSEL*, MACSEL, X4, WE*, W4, ADD, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W5*x(n-5)
CONTINUE
                 DTACK*, INSEL*, MACSEL, X5, WE*, W5, ADD, ACCUM, TRUNC, DRDY*
 Add W6 \pm x(n-6)
CONTINUE
                 DTACK*, INSEL*, MACSEL, X6, WE*, W6, ADD, ACCUM, TRUNC, DRDY*
; Add w7*y(n-1)
CONTINUE
                 DTACK*, INSEL*, MACSEL, Y1, WE*, W7, ADD, ACCUM, TRUNC, DRDY*
: Add W8*y(n-2)
CONTINUE
                 DTACK*.INSEL*.MACSEL.Y2.WE*.W8.ADD.ACCUM.TRUNC.DRDY*
; Add W9*y(n-3)
CONTINUE
                 DTACK*, INSEL*, MACSEL, Y3, WE*, W9, ADD, ACCUM, TRUNC, DRDY*
: Add W10*v(n=4)
                  DTACK*, INSEL*, MACSEL, Y4, WE*, W10, ADD, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W11*y(n-5)
CONTINUE
                  DTACK*.INSEL*.MACSEL.Y5.WE*.W11.ADD.ACCUM,TRUNC.DRDY*
: Add W12*y(n-6)
CONTINUE
                  DTACK*.INSEL*.MACSEL.Y6.WE*.W12,ADD,ACCUM,RND,DRDY*
: Wait for pipeline delay
                 DTACK*, INSEL*, MACSEL, YO, WE*, ZERO, ADD, ACCUM, TRUNC, DRDY*
CONTINUE
; Output data, then loop back for next sample
WAIT.TILL TAKEN, OUTACK, FOREVER,
                  DTACK*.INSEL*.MACSEL.YO.WE.ZERO.ADD.PASS.TRUNC.DRDY
1
END
```

## APPENDIX 6 IIR ORDER 15 FILTER USING THE Am29510 AND Am29PL141

```
: Microcode field definitions
;
 WORD 32
;
; 131
         - Output enable
; P30-26 - Up code
      - Test polarity
; 225
; P24-22 - Test input select
; P21-16 - Instruction data
; P15
        - Input data taken flag
; P14
         - Input select
; P13
         - MAC result select
 P12-09 - Uperand address
;
       - Operand RAM write enable
:
 P08
 P07-03 - Coefficient select
;
; 202
      - MAC pass/accumulate control
; P01
         - MAC round/truncate control
;
 2004
        - Output data ready flag
:
:
 Application Definitions
;
:
                             . .
:
DTACK:
        EQU
                 1B#0
                         ; Input data taken ack
INSEL:
        EQU
                 1B#0
                         ; Input select tristate control
MACSEL: LOU
                 1B#0
                         ; MAC input select = INSEL*
XN:
                 4H#1
        EQU
                         ; Next data sample x(n)
Z 0:
        EQU
                 4H#O
Z 1:
        EQU
                 4H#F
                         ; Intermediate result delayed once
Z2:
        EQU
                 4H#E
Z3:
        EQU
                 4H#D
Z4:
        EQU
                 4H#C
Z5:
        EQU
                 4H#B
Z6:
        EQU
                 4H#A
Z7:
                 4H#9
        EQU
Z8:
        EQU
                 4H#8
Z9:
        EQU
                 4H#7
Z10:
        EQU
                 4H#6
Z11:
        EQU
                 411#5
Z12:
        EOU
                 4H#4
                 4H#3
Z13:
        EQU
Z14:
        EQU
                 4H#2
```

| WO:          | EQU        | 5D#0%            | ;     | Coefficient select                                        |
|--------------|------------|------------------|-------|-----------------------------------------------------------|
| W1:          | EQU        | 5D#1%            |       |                                                           |
| W2:          | EQU        | 5D#2%            |       |                                                           |
| W3:          | EQU        | 5D#3%            |       |                                                           |
| W4:          | EQU        | 5D#4%            |       |                                                           |
| W5:          | EQU        | 5D#5%            |       |                                                           |
| W6:          | EQU        | 5D#6%            |       |                                                           |
| W7:          | EQU        | 5D#7%            |       |                                                           |
| W8:          | EQU        | 5D#8%            |       |                                                           |
| W9:          | EQU        | 5D#9%            |       |                                                           |
| W10:         | EQU        | 5D#10%           |       |                                                           |
| W11:         | EQU        | 5D#11%           |       |                                                           |
| W12:         | EQU        | 5D#12%           |       |                                                           |
| W13:         | EQU        | 5D#13%           |       |                                                           |
| W14:         | EQU        | 5D#14%           |       |                                                           |
| W15:         | EQU        | 5D#15%           |       |                                                           |
| W16:         | EQU        | 5D#16%           |       |                                                           |
| W17:         | EQU        | 5D#17%           |       |                                                           |
| W18:         | EQU        | 5D#18%           |       |                                                           |
| W19:         | EQU        | 5D#19%           |       |                                                           |
| W20:         | EQU        | 5D#20%           |       |                                                           |
| W21:         | EQU        | 5D#21%           |       |                                                           |
| W22:         | EQU<br>EQU | 5D#22%           |       |                                                           |
| W23:<br>W24: | EQU        | 5D#23%<br>5D#24% |       |                                                           |
| W24:<br>W25: | EQU        | 5D#24%<br>5D#25% |       |                                                           |
| W25:         | EQU        | 5D#25%           |       |                                                           |
| W27:         | EQU        | 5D#27%           |       |                                                           |
| W28:         | EQU        | 5D#28%           |       |                                                           |
| ZERO:        | EQU        | 5D#31%           |       | Zero coefficient for NOOP                                 |
| WE:          | EQU        | 1B#0             |       | Operand KAM write enable                                  |
| RND:         | EQU        | 1B#0             |       | 29510 round control                                       |
| TRUNC:       | EQU        | RND#             | ,     |                                                           |
| ACCUM:       | LQU        | 1B#0             | :     | 29510 accumulate control                                  |
| PASS:        | EQU        | ACCUM*           | •     |                                                           |
| DRDY:        | EQU        | 1B#0             | ;     | Output data ready                                         |
| ;            |            |                  |       |                                                           |
| INRDY:       | FQU        | 3Q#0             | ;     | Test condition input TO                                   |
| READY:       | FON        | B#0              |       | Polarity control for ready                                |
| OUTACK:      |            | 3Q#1             |       | Test condition input T1                                   |
| TAKEN:       | LQU        | B#0              | ;     | Polarity control for taken                                |
| ;            |            |                  |       |                                                           |
| ;            | _          |                  |       |                                                           |
| CONTINU      |            | DEF              |       | 1#2D:,10D#0%, ; Continue                                  |
| /            |            |                  |       | , 1V, 1V, 1V                                              |
| LD.CREG      |            | DEF              |       | H#24:,1B#1,3Q#6,6V%:, ; Load counter                      |
| /            | IV, IV, 1  |                  |       | , 1V, 1V, 1V                                              |
| LOOP:<br>/   | 1 1 1 1 1  |                  |       | <pre>##28:,4X,6V%:, ; Go to pipeline if C&lt;&gt;0</pre>  |
| WAIT.TI      |            | DEF              |       | ,1V,1V,1V<br>1#3A:,1V,3V%:,6V%:,    ; Wait for test input |
| /            |            |                  |       | 1#3A:,1V,3V%:,6V%:,    ; Wait for test input<br>,1V,1V,1V |
| ;            |            | * ,              | , v ر | , , ,                                                     |
| END          |            |                  |       |                                                           |
|              |            |                  |       |                                                           |

```
; This program implements the equations:
z(n) = W(0)^*x(n) + W(1)^*z(n-1) + ... + W(14)^*x(n-14)
z(n) = z(n) + W(15) + z(n-1) + ... + W(28) + z(n-14)
; The CC input is grounded for unconditional jumps.
; TO is connected to input data ready
; T1 is connected to output data taken
TRUE:
         EQU
                  B#1
; Make previous operands = 0
INIT:
LD.CREG D#15.
                  DTACK, INSEL*, MACSEL*, ZO, WE, WU, PASS, TRUNC, DRDY*
LOOP $,
                  DTACK, INSEL*, MACSEL*, ZU, WE, WU, PASS, TRUNC, DRDY*
FOREVER:
WAIT.TILL READY.INRDY.$+1.
                                    : Wait for data to start
                  DTACK*, INSEL, MACSEL*, XN, WE, WU, PASS, TRUNC, DRDY*
1
: Acknowledge input. do WO*x(n)
CONTINUE
                  DTACK, INSEL*, MACSEL, XN, WE*, WU, PASS, TRUNC, DRDY*
: Add W1*z(n-1)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z1, WE*, W1, ACCUM, TRUNC, DRDY*
: Add W2 \approx z(n-2)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z2, WE*, W2, ACCUM, TRUNC, DRDY*
: Add W3*z(n-3)
CONTINUE
                  DTACK*,INSEL*,MACSEL,Z3,WE*,W3,ACCUM,TRUNC,DRDY=
: Add W4 \approx z(n-4)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z4, WE*, W4, ACCUM, TRUNC, DRDY*
: Add W5*z(n-5)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z5, WE*, W5, ACCUM, TRUNC, DRDY*
: Add w6*z(n-6)
                  DTACK*, INSEL*, MACSEL, Zo, WE*, Wo, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W7 \neq z(n-7)
                  DTACK*, INSEL*, MACSEL, Z7, WE*, W7, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W8*z(n-8)
                  DTACK*, INSEL*, MACSEL, Z&, WE*, W&, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W9*z(n-9)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z9, WE*, W9, ACCUM, TRUNC, DRDY*
: Add W10*z(n-10)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z10, WE*, W10, ACCUM, TRUNC, DRDY*
; Add W11*z(n-11)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z11, WE*, W11, ACCUM, TRUNC, DRDY*
: Add W12*z(n-12)
                  DTACK*, INSEL*, MACSEL, Z12, WE*, W12, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W13*z(n-13)
                  DTACK*, INSEL*, MACSEL, Z13, WE*, W13, ACCUM, TRUNC, DRDY*
CONTINUE
: Add W14*z(n-14)
CONTINUE
                  DTACK*, INSEL*, MACSEL, Z14, WE*, W14, ACCUM, TRUNC, DRDY*
;
```

6-3

Save intermediate result : DTACK\*, INSEL\*, MACSEL, ZO, WE\*, ZERO, ACCUM, TRUNC. DRDY\* CONTINUE DTACK\*, INSEL\*, MACSEL, ZO, WE, ZERO, ACCUM, TRUNC, DRDY\* CONTINUE : Add W15\*z(n-1) CONTINUE DTACK\*.INSEL\*.MACSEL.Z1.WE\*.W15.ACCUM.TRUNC.DRDY\* : Add W16\*z(n-2) DTACK\*.INSEL\*.MACSEL.Z2.WE\*.W16.ACCUM.TRUNC.DRDY\* CONTINUE : Add W17\*z(n-3)CONTINUE DTACK\*.INSEL\*.MACSEL.Z3.WE\*.W17.ACCUM.TRUNC.DRDY\* : Add W18\*z(n=4) DTACK\*, INSEL\*, MACSEL, Z4, WE\*, W18, ACCUM, TRUNC, DRDY\* CONTINUE : Add W19\*z(n-5) DTACK\*, INSEL\*, MACSEL, 25, WE\*, W19, ACCUM, TRUNC, DRDY\* CONTINUE : Add W20\*z(n-6) DTACK\*.INSEL\*.MACSEL.Z6.WE\*.W20.ACCUM.TRUNC.DRDY\* CONTINUE : Add W21\*z(n=7) CONTINUE DTACK\*, INSEL\*, MACSEL, ZY, WE\*, W21, ACCUM, TRUNC, DRDY\* ; Add W22\*z(n=8) CONTINUE DTACK\*.INSEL\*.MACSEL.Z8.WE\*.W22.ACCUM.TRUNC.DRDY\* : Add W23\*z(n-9) DTACK\*, INSEL\*, MACSEL, Z9, WE\*, W23, ACCUM, TRUNC, DRDY\* CONTINUE : Add W24\*z(n-10)CONTINUE DTACK\*, INSEL\*, MACSEL, Z10, WE\*, W24, ACCUM, TRUNC, DRDY\* : Add W25\*z(n-11) CONTINUE DTACK\*.INSEL\*.MACSEL.Z11.WE\*.W25.ACCUM.TRUNC.DRDY\* : Add W26\*z(n-12) DTACK\*, INSEL\*, MACSEL, Z12, WE\*, W26, ACCUM, TRUNC, DRDY\* CONTINUE : Add W27\*z(n-13) CONTINUE DTACK\*.INSEL\*.MACSEL.Z13.WE\*.W27.ACCUM.TRUNC.DRDY\* : Add W28\*z(n-14) CONTINUE DTACK\*, INSEL\*, MACSEL, Z14, WE\*, W28, ACCUM, RND, DRDY\* ; Wait for pipeline delay DTACK\*.INSEL\*.MACSEL.XN.WE\*.ZERO.ACCUM.TRUNC.DRDY\* CONTINUE ; Wait for data taken, then loop back for next sample WAIT.TILL TAKEN, OUTACK, FOREVER 1 DTACK\*,INSEL\*,MACSEL,XN,WE\*,ZERO,ACCUM,TRUNC,DRDY ; END























## ADVANCED MICRO DEVICES, INC. 901 Thompson Place P.O. Box 3453 Sunnyvale, California 94088 (408) 732-2400 TWX: 910-339-9280 TELEX: 34-6306 TOLL FREE (800) 538-8450

Order #04779B

Printed in U.S.A. IH-MU-15M-6/86-0