Downloading the Ethernet Acceleration Design Example - PDF

Please download to get full document.

View again

of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Religion & Spirituality

Published:

Views: 28 | Pages: 18

Extension: PDF | Download: 0

Share
Related documents
Description
Accelerating Nios II Networking Applications AN Application Note This application note describes key optimizations you can use to accelerate the performance of your Nios II networking application.
Transcript
Accelerating Nios II Networking Applications AN Application Note This application note describes key optimizations you can use to accelerate the performance of your Nios II networking application. In addition, this document describes how the different parts of a Nios II Ethernet-enabled system work together, how the interaction of these parts corresponds to the total networking performance of the system, and how to benchmark the system. Ethernet is a standard data transport paradigm for embedded systems across all applications because it is inexpensive, abundant, mature, and reliable. Downloading the Ethernet Acceleration Design Example The Nios II ethernet acceleration design example is an integral part of this application note. The design example shows how the acceleration techniques can be applied in a real working Nios II system. The readme.doc file, located in the design example folder, provides additional hands-on instructions that demonstrate how to implement the acceleration techniques in a Nios II system. The readme.doc file also provides performance benchmark results. f You can find the Nios II ethernet acceleration design example on the Nios II Ethernet Acceleration Design Example page of the Altera website. Download the design example file, and unzip the file into a working directory. The Structure of Networking Applications This section describes the different parts of a general networking application. Ethernet System Hierarchy Figure 1. The Ethernet System Hierarchy Figure 1 shows the flow of information from an embedded networking application to the Ethernet. Application Transport Protocol Internet Protocol MAC PHY Ethernet TCP/IP Stack Ethernet LAN Controller 101 Innovation Drive San Jose, CA Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. ISO 9001:2008 Registered January 2013 Altera Corporation Subscribe Page 2 The Structure of Networking Applications The structure presented in Figure 1 shows a typical embedded networking system. In general, a user application performs a job that defines the goal of the embedded system, such as controlling the speed of a motor or providing the UI for an embedded kiosk. The networking stack provides the application with an application programming interface (API), usually the Sockets API, to send networking data to and from the embedded system. The stack itself is a software library that converts data from the user application into networking packets, and sends the packets through the networking device. Networking stacks tend to be very complicated software state machines that must be able to send data using a wide variety of networking protocols, such as address resolution protocol (ARP), transmission control protocol (TCP), and user datagram protocol (UDP). These stacks generally require a significant amount of processing power. The stack uses the Ethernet device to move data across the physical media. Most of a networking stack s interaction with the networking device consists of shuttling Ethernet packets to and from the Ethernet device. You must consider the link layer, or physical media over which the Ethernet datagrams travel, when constructing a network enabled system. Depending on the location of the embedded system, the Ethernet datagrams might traverse a wide variety of physical links, such as 10/100 Mb twisted pair and fiber optic. Additionally, the datagrams might experience latency if they traverse long distances or need to pass through many network switches in order to arrive at their destination. Relationships Between Networking System Elements The total throughput performance of an embedded networking system is highly dependent on the interaction of the user application, networking stack, Ethernet device (and driver), as well as the physical connection for the networking link. Making substantial performance improvements in the network throughput often depends on optimizing the performance of all these elements simultaneously. In general, your networking application has some criteria for performance that are either achieved or not. However, a good first order approximation for determining the viability of your networking application is to remove the user application from the system and measure the total networking performance. This method provides you with an upper bound for total network performance, which you can use to create your networking application. This application note uses a simple benchmark program that determines the raw throughput rate of TCP and UDP data transactions. This benchmark application does very little apart from sending or receiving data through the networking stack. It therefore provides us with a good approximation of the maximum networking performance achievable. Accelerating Nios II Networking Applications January 2013 Altera Corporation The User Application Page 3 Finding the Performance Bottlenecks A wide variety of tools are available for analyzing the performance of your Nios II embedded system and finding system bottlenecks. In this application note, many of the techniques presented to increase overall system (and networking) performance were discovered through the use of the following tools: GNU profiler Timer peripheral IP core Performance counter IP core f This application note does not explore the use of these tools or how they were applied to find networking bottlenecks in the system. For more information about finding general performance bottlenecks in your Nios II embedded system, refer to Profiling Nios II Systems. The User Application In an embedded networking system, the application layer is the part of the system where your key task is performed. In general, this application layer performs some work and then uses the network stack to send and receive data. In a classic embedded networking system, your application executes on the same processor as the network stack, and competes with it for computation resources. To increase the throughput of your networking system, decrease the time your application spends completing its task between the function calls it makes to the networking stack. This technique has a twofold benefit. First, the faster your application runs to completion before sending or receiving data, the more function calls it can make to the networking stack (Sockets API) to move data across the network. Second, if the application takes less of the processor s time to run, the more time the processor has to operate the networking stack (and networking device) and transmit the data. User Application Optimizations This section describes some effective ways to decrease the amount of time your application uses the Nios II processor. January 2013 Altera Corporation Accelerating Nios II Networking Applications Page 4 The User Application Software Optimizations Compiler Optimization Level Compile your application with the highest compiler optimization possible. Higher optimizations result in denser, faster code, increasing the computational efficiency of the processor. MicroC/OS-II Thread Priority Make sure that your application task has the right MicroC/OS-II priority level assigned to it. In general, the higher the priority of the application, the faster it runs to completion. Balance the application s priority levels against the priority levels assigned to the NicheStack s core tasks, discussed in Structure of the NicheStack Networking Stack on page 7. 1 This suggestion assumes that your application uses Altera s recommended method for operating the NicheStack Networking Stack, which requires using the MicroC/OS-II operating system. Hardware Optimizations Processor Performance You can increase the performance of the Nios II processor in the following ways: Computational Efficiency Selecting the most computationally efficient Nios II processor core is the quickest way to improve overall application performance. The following Nios II processor cores are available, in decreasing order of performance: Nios II/f optimized for speed Nios II/s balances speed against usage of on-chip resources Nios II/e conserves on-chip resources at the expense of speed Memory Bandwidth Using low-latency, high speed memory decreases the amount of time required by the processor to fetch instructions and move data. Additionally, increasing the processor s arbitration share of the memory increases the processor s performance by allowing the Nios II processor to perform more transactions to the memory before another Avalon master port can assume control of the memory. Instruction and Data Caches Adding an instruction and data cache is an effective way to decrease the amount of time the Nios II processor spends performing operations, especially in systems that have slow memories, such as SDRAM or double data rate (DDR) SDRAM. In general, the larger the cache size selected for the Nios II processor, the greater the performance improvement. Clock Frequency Increasing the speed of the processor s clock results in more instructions being executed per unit of time. To gain the best performance possible, ensure that the processor s execution memory is in the same clock domain as the processor, to avoid the use of clock-crossing adapters. One of the easiest ways to increase the operational clock frequency of the processor and memory peripherals is to use a pipeline bridge IP core to isolate the slower peripherals of the system. With this peripheral, the processor, memory, and Ethernet device are connected on one side of the bridge. On the other side of the bridge are all of the peripherals that are not performance dependent. Accelerating Nios II Networking Applications January 2013 Altera Corporation The User Application Page 5 Hardware Acceleration Hardware acceleration can provide tremendous performance gains by moving time-intensive processor tasks to dedicated hardware blocks in the system. The following list contains most common ways to accelerate application level algorithms: Custom Instruction Offload the Nios II processor by using hardware to implement a custom instruction. Custom Peripheral Create a block of hardware that performs a specific algorithmic task, as a peripheral controlled by the Nios II processor. For more information about hardware optimizations, refer to the SOPC Builder Design Optimizations and Hardware Acceleration and Coprocessing chapters of the Embedded Design Handbook. The Sockets API Table 1. The UDP and TCP Protocols Parameter After tuning your application to become more computationally efficient (thereby freeing more of the processor s time for operating the networking stack), you can optimize how the application uses the networking stack. This section describes how to select the best protocol for use by your application and the most efficient way to use the Sockets API. Selecting the Right Networking Protocol When using the Sockets API, you must also select which protocol to use for transporting data across the network. There are two main protocols used to transport data across networks: TCP and UDP. Both of these protocols perform the basic function of moving data across Ethernet networks, but they have very different implementations and performance implications. Table 1 compares the two protocols. Protocol Connection Mode Connectionless Connection-Oriented In Order Data Guarantee No Yes Data Integrity and Validation No Yes Data Retransmission No Yes Data Checksum Yes; Can be disabled Yes UDP In terms of just throughput performance, the UDP protocol is much faster than TCP because it has very little overhead. The UDP protocol makes no attempt to validate that the data being sent arrived at its destination (or even that the destination is capable of receiving packets), so the network stack needs to perform much less work in order to send or receive data using this protocol. However, aside from very specialized cases where your embedded system can tolerate losing data (for example, streaming multimedia applications), use the TCP protocol. 1 Design Tip: Use the UDP protocol to gain the fastest performance possible; however, use the TCP protocol when you must guarantee the transmission of the data. TCP January 2013 Altera Corporation Accelerating Nios II Networking Applications Page 6 The User Application Improving Send and Receive Performance Proper use of the Sockets API in your application can also increase the overall networking throughput of your system. The following list describes several ways to optimally use the Sockets API: Minimize send and receive function calls The Sockets API provides two sets of functions for sending and receiving data through the networking stack. For the UDP protocol these functions are sendto() and recvfrom(). For the TCP protocol these functions are send() and recv(). Depending on which transport protocol you use (TCP or UDP), your application uses one of these sets of functions. To increase overall performance, avoid calling these functions repetitively to handle small units of data. Every call to these functions incurs a fixed time penalty for execution, which can compound quickly when these functions are called multiple times in rapid succession. Combine data that you want to send (or receive) and call these functions with the largest possible amount of data at one time. 1 Design Tip: Call the Socket API s send and receive functions with larger buffer sizes to minimize system call overhead. Minimize latency when sending data Although the TCP Sockets send() function can accept an arbitrary number of bytes, those bytes might not be immediately sent as a packet. This situation is especially likely when send() is called with a small number of bytes, because the networking stack attempts to coalesce these small data chunks into a larger packet. Small data chunks are coalesced to avoid congesting the network with many small packets (using the Nagle algorithm for congestion avoidance). There is a solution, however, through the use of the TCP_NODELAY flag. Setting a socket s TCP_NODELAY flag, with the setsockopt() function call, disables the Nagle algorithm. The socket immediately sends whatever bytes are passed in as a TCP packet. Disabling the Nagle algorithm can be a useful way to increase network throughput in the case where your application must send many small chunks of data very quickly. 1 Design Tip: If you need to accelerate the transmission of small TCP packets, use the TCP_NODELAY flag on your socket. You can find an example of setting the TCP_NODELAY flag in the benchmarking application software in the Nios II ethernet acceleration design example. 1 While disabling the Nagle algorithm usually causes smaller packets to be immediately sent over the network, the networking stack might still coalesce some of the packets into larger packets. This situation is especially likely in the case of the Windows workstation platform. However, you can expect the networking stack to do so with much lower frequency than if the Nagle algorithm were enabled. Accelerating Nios II Networking Applications January 2013 Altera Corporation Structure of the NicheStack Networking Stack Page 7 The Zero Copy API The NicheStack networking stack provides a further optimization to accelerate the data transfers to and from the stack called the zero copy API. The zero copy API increases overall system performance by eliminating the buffer management scheme performed by the Socket API s read and write function calls. The application manages the send and receive data buffers directly, eliminating an extra level of data copying performed by the Nios II processor. This application note does is not discuss details of performance optimization with the zero copy API. Refer to the Appendix on page 16 for pointers to more information. 1 Design Tip: Using the NicheStack Zero Copy API can accelerate your network application s throughput by eliminating an extra layer of copying. Structure of the NicheStack Networking Stack The NicheStack networking stack is a highly-configurable software library designed for communicating over TCP/IP networks. The version that Altera ships in the Nios II Embedded Design Suite (EDS) is optimized for use with the MicroC/OS-II (RTOS), and includes device driver support for the Altera Triple Speed Ethernet MegaCore function, which serves as the media access control (MAC). The NicheStack networking stack is extremely configurable, with the entire software library utilizing a single configuration header file, called ipport.h. General Optimizations Because this application note focuses on a single Nios II system, most of the optimizations described in User Application Optimizations on page 3 also improve the performance of the NicheStack networking stack. The following optimizations also help increase your overall network performance: Software optimizations Compiler optimization level Hardware optimizations Processor performance Computational efficiency Memory bandwidth Instruction and data caches Clock frequency January 2013 Altera Corporation Accelerating Nios II Networking Applications Page 8 Structure of the NicheStack Networking Stack NicheStack Specific Optimizations This section describes the targeted optimizations that you can use to increase the performance of the NicheStack networking stack directly. NicheStack Thread Priorities Altera s version of the NicheStack networking stack relies on the MicroC/OS-II operating system s threads to drive two critical tasks to properly service the networking stack. These tasks (threads) are tk_nettick, which is responsible for timekeeping, and tk_netmain, which is used to drive the main operation of the stack. When building a NicheStack-based system in the Nios II EDS, the default run-time thread priorities assigned to these tasks are: tk_netmain =2 and tk_nettick = 3. These thread priorities provide the best networking performance possible for your system. However, in your embedded system you might need to override these priorities because your application task (or tasks) run more frequently than these tasks. Overriding these priorities, however, might result in performance degradation of network operations, as the NicheStack networking stack has fewer processor cycles to complete its tasks. Therefore, if you need to increase the priority of your application tasks above that of the NicheStack tasks, make sure to yield control whenever possible to ensure that these tasks get some processor time. Additionally, ensure that the tk_netmain and tk_nettick tasks have priority levels that are just slightly less than the priority level of your critical system tasks. When you yield control, the MicroC/OS-II scheduler places your application task from a running state into a waiting state. The scheduler then takes the next ready task and places it into a running state. If tk_netmain and tk_nettick are the higher priority tasks, they are allowed to run more frequently, which in turn increases the overall performance of the networking stack. 1 Design Tip: If your MicroC/OS-II based application tasks run with a higher priority level (lo
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks