Parallel ip lookup using multiple sram-based pipelines

1 Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin (林意勝) Date:
Table of contents

To minimize the latency, the circuit was design to ripple a most-significant-prefix MSP comparison from the most-significant slice to least-significant slice, and in parallel, ripple a least-significant-prefix LSP comparsion in the opposite direction. Thus each slice collects information from the slices above and below it in the cascade, and all slices produce consistent comparison results. The logic for this compare can be designed with standard techniques. In the context of IP routing, this architecture allows for IPv4 bit lookups and IPv6 bit lookups routing to be done simultaneously using the same hardware.

In addition, Ethernet which uses bit keys can be done with this architecture. In addition, 4 bits of TCAM resource are added to enable programmable selection of a partition depending on packet characteristics. This enables efficient implementation of virtual routers in which many different routing tables need to be maintained. Exact match search is important, for example, in the context of IP routing for identifying the specific machine to which a packet is addressed, i. Prefix match search is like exact match i. Again referring to the context of IP routing for an example, many routing algorithms only need to look at the top 24 bits of a bit value to make certain decisions.

Whatever the PM field is at the end of the pipeline is compared against a field of the action SRAM to determine if the lookup key satisified the prefix match. So, for example, there might be a requirement that a specific action be taken or a particular rule is only valid if the key is within some range defined by the prefix match, e. Or, put another way, you would have an n-bit range that would all map to the same result. According to another class of embodiments, a binary search pipeline constructed in accordance with the invention may be configured to implement another special case of prefix match, i.

A range table is not concerned with the upper bound, but instead whether the key is greater than or equal to a particular entry and less than the next entry, i. This can be accomplished by disabling the prefix match comparison. According to a particular class of embodiments, this configurability is implemented in the comparison circuit which does both a 3-way compare, and a prefix length compare. Thus, when the end of the pipeline is reached, the final index as well as the number of bits matched may be identified.

From the foregoing, it will be appreciated that the binary search pipeline can be configured rule-by-rule. The current design costs some extra area in that it requires area for the comparators, e. For example, if there are 15 stages of comparison in the SRAM section that each require a prefix compare, and then there are 4 slices to get out to bit keys, so there are actually 60 comparators. Thus, the hardware can do all three modes, i. According to some embodiments, the discrete logic at the root of the binary search circuit e. For example, as shown in FIG. Because the root is not that big, advantageous features with such additional conditions may be added without adding significant overhead to the overall circuit.

One preferred implementation is 4 extra bits of key and 8 extra configuration bits. The 8 bits are used like a TCAM, and encode for each of the top 4 bits of the key, if it must be 0, must be 1, don't care in order for the partition to be valid. In the context of IP routing, these condition bits might, for example, identify the type of packet being processed e. And depending on these conditions, specific partitions of the binary search circuit may be selected.

Alternatively, half could be used for IP multicast, and the other half for IP unicast. Thus, the same hardware resources may be used for different purposes.

Detailed data path of a typical register based cpu

A wide variety of other possibilities for using such extra condition bits will be appreciated by those of skill in the art. According to a particular implementation, in addition to the prefix match comparison, additional information is provided that map to a wide variety of action options. Such action options might include, for example, routing options, dropping, counting, policing, rate throttling, changing frame priority, changing frame VLANs, route frames, switch frames, send frames to a processor, etc. If a sufficient number of bits of the key do not match, i. In the context of frame processing pipelines in packet switches, the conventional approach to achieve this kind of functionality and configurability is to use off-chip TCAMs which are immensely power hungry and represent significant additional cost, e.

In addition, such off-chip TCAMs don't provide the action table, i. By contrast, a binary search pipeline implemented according to specific embodiments of the invention uses considerably less power than a TCAM, is low latency, and can reside on-chip along with the action tables with as many as a million or more entries in current generations.

Trích dẫn mỗi năm

And these different scenarios could occur using the same hardware on consecutive lookups. The binary search pipeline is almost as functional as a TCAM, with the exception that it can't mask out higher order bits. That is, TCAM is a masked compare in which bits are masked and a compare for exact match is performed. The binary search pipeline is an ordered compare, and then we can add some number of bits on the top e. It can perform the same function in a frame processing pipeline as a TCAM, with the advantage that it is considerably more area and power efficient. The binary search pipeline and TCAM can work very well together.

For example, adding entries to a binary search pipeline is typically a cumbersome process involving sorting within a scratch partition, copying of the scratch partition in, and then possibly sorting the partitions. This takes considerable time and processing resources.

Lecture 26: Pipelining and Parallel Processing

On the other hand, if the binary search pipeline is combined with a TCAM, entries can be added to the TCAM quickly and the change can take effect right away. This can be done until there are a sufficient number of new entries to be added to the binary search pipeline as a batch. Effectively, the TCAM acts as a temporary cache for modified entries. According to various embodiments of the invention, the separation between how the compares are done versus how the keys are stored takes advantage of the decreasing size of SRAM over time.

That is, with such embodiments, most of the area is in the storage of the keys. The SRAMs keep getting larger as you progress down the pipeline e. Thus, for a small binary search pipeline, the area may be dominated by comparison circuitry. However, as the binary search pipeline gets larger, the percentage of area attributable to SRAM grows e. Given improvements in SRAM area over time, the binary search pipeline is only going to get better in terms of area.

By contrast, for TCAMs, the area overhead is linearly proportional. Embodiments of the present invention are particularly useful for applications in which the values in the array can be sorted in advance, and in which the values in the array don't change much relative to the number of lookups performed.

For example, IP packet routing employs routing tables which are typically modified every minute or two as compared to the hundreds of millions or even billions of times per second lookups are performed. Thus, the computational cost of sorting the list every time it is updated is well worth it when one considers the latency and power savings that such an approach represents as compared to more conventional approaches such as, for example, content-addressable memories which employ a power hungry, brute force approach.

  • credit report and background check?
  • Article Info..
  • Microblaze uart lite.

It will be understood that the functionalities described herein may be implemented in a wide variety of contexts using a wide variety of technologies without departing from the scope of the invention. That is, embodiments of the invention may be implemented in processes and circuits which, in turn, may be represented without limitation in software object code or machine code , in varying stages of compilation, as one or more netlists, in a simulation language, in a hardware description language, by a set of semiconductor processing masks, and as partially or completely realized semiconductor devices.

The various alternatives for each of the foregoing as understood by those of skill in the art are also within the scope of the invention. For example, the various types of computer-readable media, software languages e.

Weirong Jiang - نقل‌قول‌های Google Scholar

Embodiments of the invention are described herein with reference to switching devices, and specifically with reference to packet or frame switching devices. Published by Modified over 4 years ago. IPDPS Introduction 2. Related Work 3. Architecture Overview 4. Memory Balancing 5. Traffic Balancing 6. Performance Evaluation 7.

Parallel IP lookup using multiple SRAM-based pipelines

The memory distribution over different pipelines as well as across different stages of each pipeline must be balanced. The traffic among these pipelines should be balanced. Akhbarizadeh, M. Nourani, R. Panigrahy, and S.