当前位置：文档之家› DETECTION IN COMMUNICATION RECEIVERS A CASE FOR ON-LINE ARITHMETIC

DETECTION IN COMMUNICATION RECEIVERS A CASE FOR ON-LINE ARITHMETIC

D ETECTION IN C OMMUNICATION R ECEIVERS:A

C ASE F OR O N-LINE A RITHMETIC

Sridhar Rajagopal and Joseph R.Cavallaro

Department of Electrical and Computer Engineering MS-366

Rice University

Houston https://www.doczj.com/doc/7f8270454.html,A

sridhar,cavallar@https://www.doczj.com/doc/7f8270454.html,

Telephone:1-713-348-2256,1-713-348-4719

Fax:1-713-348-6196

This work was presented in part at the IEEE International Symposium on Computer Arithmetic,Vail,CO,2001and submitted in part for the IEEE International Conference on Application-speci?c Systems,Architectures and Processors,San Jose,CA, 2002.This work was supported by Nokia,Texas Instruments,the Texas Advanced Technology Program under grant1999-003604-080,and by NSF under grant ANI-9979465.

Abstract

This paper presents low power architectures using conventional and on-line arithmetic to accelerate detection in mobile communication receivers.We identify the potential application of on-line arithmetic for simple and advanced detection algorithms in mobile communication systems.We develop innovations in on-line arithmetic modules for better integration with conventional arithmetic systems.We explore a suite of tradeoffs in the usage of on-line arithmetic on the throughput,area and latency and its effects on bit error rate performance.We design novel detector circuits using radix-4on-line arithmetic for detection algorithms.Our detector can directly interface with conventional arithmetic circuits if necessary and hence,be used as co-processor support to a DSP.A time-constrained conventional arithmetic detector using carry-save adders is?rst presented that achieves1.62 1.86 speedup in throughput and a corresponding1.31 1.08reduction in area for bit precisions against a direct implementation.The on-line arithmetic detector,however,does much better,providing3.94 5.94speedup in throughput and area reduction from114.55respectively against the same implementation.Hence,by eliminating extraneous computations,decreasing area requirements and simultaneously increasing throughput,on-line arithmetic can provide signi?cant power savings and is well suited for detection in mobile wireless receivers. We also show the application of on-line arithmetic for multiuser detection,soft-decisions and higher modulation schemes in advanced communication receivers

Keywords

On-line arithmetic,on-line addition,on-line multiplication,detection,W-CDMA,low power,VLSI implemen-tation,mobile communication systems.

I.I NTRODUCTION

Advanced communication systems,such as those based on W-CDMA(Wideband Code Di-vision Multiple Access)[1],EDGE(Enhanced Data rates for Global Evolution)[2],802.11a OFDM(Orthogonal Frequency Division Multiplexing)based Wireless LAN[3]are being de-signed to increase data rates by orders-of-magnitude and enhance performance signi?cantly in order to support real-time multimedia services in the near future.However,the increase in com-plexity required by the algorithms to support these high data rates and multimedia capabilities has made it dif?cult to achieve real time hardware implementations.Detection is one of the core baseband processing operations in a mobile communication receiver and is used to?nd the transmitted bits from the source after it has been corrupted by noise and other interference present in the channel.The data rates that can be achieved in a communication system are de-pendent on the speed of detection in hardware.Hence,a mobile receiver not only has to be area

and power ef?cient to meet the mobile device constraints,but also has to accelerate detection to meet real-time performance requirements.

On-line arithmetic[4],[5],[6]has been shown to be very useful for many signal processing applications such as DCT,FFT,CORDIC,?ltering and matrix based operations[7],[8].In con-ventional arithmetic,all digits of the result must be computed before the next operation can be started.However,in on-line arithmetic,the operands,as well as the results,?ow through the computations in a digit-by-digit manner,starting from the most signi?cant digit(MSDF).The advantages of on-line arithmetic have been to eliminate carry-propagate addition,reduce inter-connection bandwidth between modules and allow parallelism between several operations.With a serial data?ow,on-line arithmetic can be pipelined to implement sophisticated algorithms.As carry-propagation is eliminated,on-line operations can be overlapped.The digit-serial nature of on-line arithmetic can be extremely bene?cial for small area and low power architectures. Though on-line arithmetic has been shown to provide a speed-up of216[9]for con-ventional numerical operations,this gain is reduced when conventional arithmetic systems are deeply pipelined.This is because the word-parallel logarithmic-time computations attain bet-ter throughput than the digit-serial on-line computations(though at the expense of larger area and higher latency).The other implementation tradeoffs related to the applicability of on-line arithmetic are its need for a non-conventional number system,conversions to-and-from a con-ventional system[10]and the inherently serial nature of the operations.We overcome all these above disadvantages to promote a strong case for the use of on-line arithmetic for detection in communication systems.

The main contributions of our work are as follows.We identify the potential applications of on-line arithmetic for detection in communication systems.We develop innovations in on-line arithmetic modules for better integration with conventional arithmetic systems.We explore a suite of tradeoffs in the usage of on-line arithmetic on throughput,area and latency and its ef-fects on the bit error rate performance.We?rst present the optimal,variable throughput on-line arithmetic detector that obtains the same bit error rate as the detector using conventional arith-metic.Then,we present a truncated,?xed throughput,on-line arithmetic detector to attain higher throughput and we quantify its performance on the bit error rates of the communication receiver. Then,we present a time-constrained conventional arithmetic architecture based on Carry-Save

Adders(CSAs)for comparison purposes.We also develop novel radix-4on-line multipliers and on-line adders that allow our circuit to directly interface with a conventional arithmetic circuit,if necessary for example,as a co-processor support to a DSP.In all the architectures discussed,we present time-constrained versions in order to attain the maximum data rates and then,compare the area requirements for these architectures to determine their utility in a low power architecture design.

After initially applying on-line arithmetic to simple detection problems and demonstrating its effectiveness,we show that on-line arithmetic is also effective for more sophisticated detec-tion problems.In many emerging standards and technologies,the detection problem becomes more sophisticated.For example,at the base-station,there are many users interfering with each other and the detection problem for a single user grows into a multiuser detection problem[11], [12].Even in the handset,detection for the user needs to account for interference from other users[13].Decoding algorithms[14],[15]sometimes use soft decisions from the detector to provide better decoding of the information.Emerging standards[1],[2],[3],[16]also support higher data rates which use higher modulation schemes.These detection schemes need to detect multiple bits per symbol requiring more boundaries for detection.We outline the application of on-line arithmetic to multiuser detection,soft decisions and to these higher modulation schemes. We have organized this paper as follows.The next section presents the advantages of using on-line arithmetic for detection and the bit error rate trade-offs.Section3presents radix-4on-line adder and multiplier design.Section4shows the architecture design of the detector based on conventional and on-line arithmetic.In Section5,the area-time tradeoffs of the different detector architectures and results are presented.Section6shows the application of on-line arithmetic to obtain soft decisions and detection for higher modulation schemes.

II.D ETECTION IN MOBILE COMMUNICATION RECEIVERS

Figure1shows the main blocks in a mobile communication receiver.The physical baseband layer in a mobile communication receiver involves operations to detect and decode the transmit-ted information bits.Sophisticated algorithms for channel estimation,detection and decoding are applied on the receiver to determine the transmitted bits.Based on these high-precision op-erations,a hard decision(a sign-based test)is made on the transmitted bits for detection,which are typically’s,assuming a Binary Phase Shift Keying(BPSK)system for simplicity.De-

Fig.1.Block diagram of a mobile communication receiver.The detector produces either single-bit precision outputs(hard decisions)of the detected bits or higher precision outputs(soft decisions).The?nal decision on the transmitted information is made at the decoder.

tection for Quadrature Phase Shift Keying(QPSK)[17]used in current W-CDMA standards can be easily translated to two BPSK detections as the transmission in both the dimensions are orthogonal.

A.Code matched?lter and multiuser detection algorithms

A brief mention of the detection schemes used for the base-station in a CDMA-based wireless communication receivers is shown to signify the type of operations involved.For details,the reader is referred to[11],[12],[18].Let be the received signal and be the true cross-correlation matrix.is the length of the spreading code(also known as the spreading gain or the spreading factor)and is the number of users in the system.Let

be the bits of the users to be detected.Then,the system can be formulated as given below:

(1) where is the Additive White Gaussian Noise(AWGN)in the system.The single user detector or the matched?lter detector with hard decision outputs is shown to be:

(2) where and refer to the estimate of the detected bit and the estimate of the channel respec-tively(as opposed to the true value).Figure2shows the architecture of a single user matched ?lter detector which exhibits a inner(dot)product structure for computations followed by sign-based testing.The subscript,,refers to the user in the system.The received signal coming

Code Matched Filter Detector

r b i, p = sign(A

p H r i ) A H A H A H A H p ,1

p,2 p,N-1 p,N ^ ^ ^ ^ Fig.2.A code matched ?lter detector for the

user in the system.The operations involve testing a dot product

for its sign.from the A/D converter has typically 8-bit or greater precision [19].Depending on the algo-rithm implementation and the desired bit error rate performance,the precision requirements of the hardware processing the received signal is typically in the 8-16bit precision range for ?nite precision implementations such as those in [18],[20],[21],with more sophisticated algorithms requiring greater precision for operations such as division and matrix inversions.This implies extraneous computations in a conventional number system as the sign is obtained only at the end due to the Least Signi?cant Digit First (LSDF)nature of computations.On-line arithmetic,based on a signed digit number representation,provides Most Signi?cant Digit First (MSDF)computation.Hence,the computations can stop after the ?rst non-zero MSD (sign)is computed and additional computations for the successive digits are avoided.The need for back-conversion to a conventional number system is also not required as the sign of the digit represents the de-tected bit.

The code matched ?lter is also useful for the mobile handset algorithms and can be used as an initial estimate for more advanced handset schemes [13],[22],[23],[24].At the base-station,the multiuser detector [11],[12]helps in canceling the interference of other users in the system and

^ ^ Received

Signal

Channel Estimates Detected bits

Fig.3.Time-constrained pipelined detector architecture.The ?gure shows the matched ?lter which provides the initial estimates for parallel interference cancellation.A 3-stage parallel interference cancellation detector is shown.All stages are identical.The ?rst stage of the detector is expanded in detail.

is one of the advanced algorithms proposed for detection in 3G systems for better performance.One of the popular algorithms for multiuser detection which has good error rate performance and suitable for implementation is Parallel Interference Cancellation (PIC)[25],[26],[27],[18].The multistage multiuser detector uses the soft decisions of the matched ?lter to get an initial estimate of the bits and then subtracts the interference caused by other users.The multistage detector performs parallel interference cancellation iteratively in stages.

(3)

(4)

Equation(3)may be thought of as subtracting the interference from the past bit of users,with more delay,and the future bits of the users,with less delay than the desired user.The left matrix ,stands for the partial correlation between the past bits of the interfering users and the desired user,the right matrix,stands for the partial correlation between the future bits of the interfering users and the desired user.The center matrix,is the correlation of the current bits of interfering users and the diagonal elements are zeros since only the interference from other users,represented by the non-diagonal elements,is canceled.The lower index,, represents time,while the upper index,,represents the iterations.We assume and that are derived from the channel estimates to be pre-computed as part of channel estimation procedure [18].

Figure3shows the architecture of the multistage multiuser detector[27]based on equa-tions(3),(4).The detector takes in the soft decisions,,and the hard decisions,,from the asynchronous matched?lter.The partial correlation matrices(channel estimates)are also loaded.The detector subtracts the interference iteratively in stages from the matched?lter output.Three stages of PIC are shown to be suf?cient for convergence of multistage detection [26].The detected bits are then sent to the decoder for retrieving the transmitted information. Each stage of the multiuser detector uses only adders because multiplication by single bits can be reduced to addition and subtraction.

B.On-line arithmetic background

On-line arithmetic algorithms[4],[7]work in a digit-serial manner,producing the result in a MSDF fashion.To generate the?rst output digit,digits of the input are required.Thereafter, with each digit of the input,a new digit of the result can be obtained.The on-line delay,,is typically a small integer,e.g1to4.Since the outputs are produced serially,the algorithms can be pipelined with a latency of as shown below:

Input....

Output....

In order to achieve MSDF operations,on-line algorithms need to use a redundant number system [28]for carry-free addition.The on-line representation of a number is given by[4]

where represents the value of the addends at step,is the radix of the redundant number sys-tem and is the on-line delay.The digits belong to a redundant digit set,-,.....,-1,0,1,....., (assumed symmetric)where,represents the amount of redundancy in the num-ber system.For our system,we shall assume for maximum redundancy as this will allow the inputs to the system to be directly acceptable in redundant form for on-line operations.

https://www.doczj.com/doc/7f8270454.html,ing on-line arithmetic for detection

Figure4shows a simple example to motivate the use of on-line arithmetic for detection.Fig-ure4(a)shows a simple sign based detector for the received signal,which has been transmitted in an AWGN communication channel.The probability distribution of the received signal[17] is as shown in Figure4(b-1).The sign of the received signal determines the bit that has been transmitted.Figures4(b-2)and4(b-3)represent the time taken by conventional and on-line arithmetic implementations of the detector to?nd the sign,assuming a8-bit precision and radix-4on-line arithmetic modules.The-axis is normalized to the time taken to?nd the?rst digit using on-line arithmetic.Conventional arithmetic,being logarithmic time with precision,will have a lower throughput than the on-line throughput per digit(roughly shown as in the?gure for illustrative purposes).The?gure shows that the on-line arithmetic scheme can take variable time to detect the sign while the conventional arithmetic detector?nds the sign in constant time (assuming constant precision for both schemes).Hence,we will have performance gains as long as the expected time to detect the sign(the?rst non-zero MSD)is less than that of conventional arithmetic for the same precision.From Figure4(b),it is clear that the larger the radix used for the on-line detection scheme,the faster it is to determine the sign.Hence,a higher radix system is preferable for our design.However,higher radix systems have overhead in terms of area and delay.We choose a radix-4system as a trade-off point as it covers most of the area under the probability density curve for the?rst digit()and it reduces the on-line delay,,to for

Sign based (a)

(b)

-1 -0.5 0 0.5 1

0 0.5

1 1.5

2 2.5

3 3.5

4 4.5

Received Signal Amplitude (Normalized)

T i m e t a k e n f o r a d d i t i o n (N o r m a l i z e d )https://www.doczj.com/doc/7f8270454.html,e of on-line arithmetic for detection.Part (a)shows a noisy communication channel through which a

or is transmitted.Part (b)shows the time taken by conventional and on-line arithmetic to detect the sign of the received signal.To have performance bene?ts,the sign should be detected before the on-line arithmetic time exceeds the conventional arithmetic time.A 8-bit precision and a radix-4on-line module is assumed.

multiplication and addition [4],[29].

On-line arithmetic,being digit serial,has a lower throughput if all the digits are required in the computation.Since the probability of obtaining the sign using the ?rst digit is high,we truncate the on-line detector to stop the computations immediately after the ?rst digit has been received.However,the throughput speedup due to the truncation of the ?rst digit is at the expense of increase in the probability of error.This is as shown in Figure 5.The probability of

Fig.5.Probability of error for truncated detection.The?gure shows the increase in probability of error assuming only the?rst digit or?rst2digits are used for detection in a BPSK system.

error for an optimal detector for BPSK[17]is given by

(5)

where is the Q-function[17],is the energy of the transmitted bit,is the noise energy and is the variance of the noise.However,if we detect only the MSD’s,then the probability of error for the on-line detector for BPSK can be shown to be

(6)

where is the radix used in the on-line implementation.We can see from equation(6)that with increasing the radix or the number of digits to be detected,we get increasingly closer to the optimal detector in equation(5).Figure5shows the probability of error for a radix-4number system for the?rst and second most signi?cant digits(and)and compares it with the optimal detector.We can see around a drop in performance assuming only

the MSD is used for detection.This may still be acceptable for higher throughput as there could be a decoder following this detector to compensate for the bit error rate degradation. However,an additional digit makes the performance of the on-line detector close to the optimal detector.Hence,we will consider both a as well as a MSD truncated on-line detector in this paper for comparison purposes instead of a variable throughput,full precision,on-line detector architecture as presented in our recent work[30].

Figure5assumes a simple transmission system without any CDMA spreading code applied on top of it().The error rate curve for the CDMA matched?lter detector of equation (2)is similar except that the Signal to Noise Ratio(SNR)is scaled by the spreading gain(giving better performance for the same SNR).This is assuming perfect knowledge of the channel estimates of the user.With proper channel estimation for fading channels,a closed form expression for the probability of error is no longer available,but performance similar to Figure5can be expected.

Figure6shows the suitability of on-line arithmetic for low precision decisions on higher pre-cision data.Figure6(a)shows the timing schedule of a conventional arithmetic full precision matched?lter detector shown in Figure2.An implementation based on conventional arithmetic has throughput as a logarithmic function of the precision as both fast adders and multipliers can be built for logarithmic time computation i.e.[31],[32],[33],where is the bit-precision.Figure6(b)shows the application of on-line arithmetic for a full pre-cision matched?lter implementation.As an example,a radix-4system and8-bit precision is assumed,giving4digits per input.Though the full precision implementation has a low latency, the throughput of the system for repeated operations is a linear function of the precision i.e.

,where is the on-line throughput time per digit.Note that one digit per computation is wasted as it signi?es the end of the current computation precision(shown by RESET()in Figure6,which clears all latches at the end of the computation).Also,additional have to be inserted for full precision multiplication as the multiplier generates twice the output precision.However,as shown in Figure6(c),the determination of the sign(the?rst non-zero MSD)can take variable time.This?gure refers to the optimal(theoretical sign)detector of Figure6,which waits until the?rst non-zero digit is received.As soon as the sign is detected, we can insert the RESET signal(R)for all the adder stages in the pipeline.We will then have

a i *

b i Tree addition Level 1

Result R R (d) On-line arithmetic with truncated detection R R t (2 MSDs ) ( c ) On-line arithmetic for detection (variable throughput)

a i *

b i Level 1 0 0 0 R

0 0 0 R a i * b i Tree addition Level 1 (a) Conventional arithmetic with full precision

Tree addition Level 1 Tree addition Result a i * b i t CONV - M F Fig.6.Timing comparisons for different arithmetic techniques.Part (a)shows the the time taken by a conventional arithmetic circuit for the dot product with full precision.Part (b)shows the time taken by a on-line arithmetic module for calculating a dot product with full precision.Part (c)shows the time taken by a on-line arithmetic detector to ?nd the sign of the dot product.Part (d)shows the time taken by a truncated-precision detector to ?nd the sign.

idle stages in the pipeline (shown by )during which the hardware could stop the clock for low power.Since the time for obtaining the MSD is variable,the time taken is almost the same as part(b),i.e.and the area is also the same.The only advantage is due to the power savings during idle computations.If only the ?rst few MSDs in the output are desired or as in the case of truncated detection,the sign of the computation is obtained within the ?rst two digits (The MSD detector in Figure 6),a higher throughput can be obtained along with savings in area.In this case,==.

III.R ADIX -4ON -LINE ADDER AND MULTIPLIER MODULES

This section describes the design of a radix-4on-line adder and multiplier used in the detec-tors.As opposed to previous radix-4on-line multiplier implementations [8],[34]that assume one of the inputs as constant,we design a generalized multiplier assuming both inputs as vari-

able.The multiplier is also enhanced to accept inputs in a conventional number system so that the detector can directly interface with a conventional arithmetic circuit,if needed.The design of these modules were veri?ed by simulation using the Xilinx Foundation Software for a schematic implementation on a FPGA.The area and throughput of the radix-4adder and multiplier are de-termined in order to understand the area-throughput tradeoffs in the detectors.

A.Radix-4on-line adder

The steps involved in?xed point on-line addition[4],[35]of2inputs and,giving an output are:

Initialization:

Recurrence

for j=0,1,.....,m+1do

(7)

where is the residual and is the number of digits in the inputs.The selection is done by rounding by using a discretization algorithm(DIS)[35]that performs the rounding based on the ?rst few MSDs.The selection function is given by:

Since the on-line adder interfaces with the output of an on-line multiplier and as an input to other on-line adders,we design the adder to be a general on-line adder,taking in redundant operands and producing a redundant result.The design of the on-line adder is shown in Figure7.The inputs are shown as and,with higher indices representing the more signi?cant bits.The inputs are taken to be in two’s complement radix-4maximally redundant form(which is also the form in which the on-line multiplier produces its output).The residual is stored in Carry-Save(CS)form to avoid carry propagation delay(shown as and for residual sum and carry).The design includes3half adders(HA)’s,1Full Adder(FA),7latches to store the residual and the output and a3-bit Carry Lookahead adder(CLA)for the digit selection.This

Fig.7.A radix-4on-line adder.

circuit has a delay of8gates and an area of71gates.

(8)

(9)

For comparison purposes,we assume the following:

Full Adder:Area=5gates,Delay=2gates

Half Adder:Area=2gates,Delay=1gate

3-bit CLA:Area=21gates,Delay=5gates

Latch:Area=5gates

We assume the time delay for a-bit CLA adder with a fan-in of as

(10) as in[31],[32]and the area in gates can be calculated as

better tradeoff for throughput.Radix-2multipliers[36],[37]can still be used if area is a more important constraint than throughput for the detector design.

Radix-4on-line multipliers have also been designed for?lter design problems and Discrete Cosine Transforms(DCT)[8],[34],where one of the inputs to the multiplier is considered constant.This is a reasonable assumption for?lters as the coef?cients were not time-varying and the SDVM multiplication is again avoided.However,since both the inputs in the detector are variable,we need to implement a general Signed Digit by Vector Multiplier(SDVM).The SDVM needed is a two’s complement multiplier of size.In order to make the multiplication time precision-independent,we produce the outputs of the SDVM in CS form.The multiplier outputs in CS form are then added together with the residual using a compressor(parallel counter)[31],[33].We then use the MSDs in the residual for the selection function as in[8], [34]to attain on-line time independent of the precision.

Figure8shows the implementation of a-bit radix-4multiplier in two’s complement rep-resentation.Since the on-line multiplier may interface to a conventional arithmetic circuit(See Figures1and2),we design the multiplier to take in inputs in conventional arithmetic two’s com-plement representation.The inputs and shown could represent the input,,from the A/D converter or the input,,from the channel estimator with a parallel to serial(P/S)converter. We internally add the appropriate sign digit to the two’s complement number in our circuit with-out any additional time overhead to make the number redundant.The and blocks take2bits from each of the inputs in the conventional arithmetic system and convert them to a redundant number system.To do so,we note that all positive two’s complement conventional numbers are valid in the redundant number system with a as the sign digit for all digits.All negative two’s complement numbers are valid with a as the sign for the?rst digit and s for the successive digits.The and blocks append the digits generated to form the vector for the SDVM as in equation(12).

The SDVM is designed based on the modi?ed Baugh-Wooley multiplier scheme[31],[38].In a general bit multiplication,the Baugh-Wooley method does not require any increase in the maximum column height.But since the operands to the multiplier’s input do not have the same precision(),an additional stage of CSAs are needed to add the1’s at the bottom of the tree, increasing the delay of the partial product accumulation by1stage of CSAs.Since the multiplier

outputs are now combined with the residual addition,the number of inputs to be added become .Hence,a compressor is used for addition.Again,to make the on-line multiplication time independent of precision,the selection function uses only the most signi?cant4bits to determine the result as shown in Figure8.

The need for a SDVM and a compressor makes the on-line time requirements for mul-tiplication(gate delays)greater than that for addition(gate delays).Since the on-line multiplier output feeds the on-line adders,this will leave the adder starved for data.Hence,in order to match the delays of the adder and multiplier,we pipeline the design further as shown by the dotted lines in Figure8.This modi?ed design has a delay of gates,equal to the delay of the compressor and the HAs.The pipeline stages increase the on-line delay,,of the multiplier from1to3,resulting in higher throughput at the expense of larger latency.The complete on-line multiplier uses gates for bit precision.This large area requirement even for a digit-serial multiplier is due to the need for a SDVM and a large number of latches required to pipeline the design.If this area requirement exceeds the detector architecture speci?cations, then a radix-2multiplier can be used at the expense of throughput.

(13)

(14)

(15)

[Q S 1,Q S 1,P S [d -1:1],0,0]Fig.8.A general radix-4on-line multiplier.

(a) Conventional Code Matched Filter

( b ) Conventional Code Matched Filter using a CSA Tree

( c ) On-line Code Matched Filter b i, p = sign(A

p H r i ) b i, p = sign(A

p H r i ) i, p p i Fig.9.Code matched ?lter with different arithmetic schemes.Part(a)shows a simple implementation using con-ventional arithmetic.Part(b)shows a conventional implementation using CSAs.Part(c)shows an on-line arithmetic implementation.

multiplier as building blocks for calculating their area-time requirements and comparisons with conventional arithmetic.

IV.C ONVENTIONAL AND ON -LINE IMPLEMENTATION OF A MATCHED FILTER DETECTOR We discuss different implementation schemes for a code matched ?lter detector using a simple conventional arithmetic implementation,a time constrained conventional arithmetic implemen-tation using carry-save adders and an on-line arithmetic implementation.These are shown in Figure 9.In all the architectures discussed below,we present time-constrained versions in order to attain the maximum data rates and then,we compare the area requirements for these architec-tures to determine their utility in a low power architecture design.