ABSTRACT
We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps — 70%) & reduces latency (up to 101 us — 28%) and enables nontrivial packet processing (e.g., router) at ~100 Gbps, when new packets arrive>10× faster than main memory access times, while using only one processing core.
Bilal Anwer, Theophilus Benson, Nick Feamster, and Dave Levin. 2015. Programming Slick Network Functions. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (Santa Clara, California) (SOSR ?15). Association for Computing Machinery, New York, NY, USA, Article 14, 13 pages. isbn:9781450334518 https://doi.org/10.1145/2774993.2774998 Google ScholarDigital LibraryD. Barach, L. Linguaglossa, D. Marion, P. Pfister, S. Pontarelli, and D. Rossi. 2018. High-Speed Software Data Plane via Vectorized Packet Processing. IEEE Communications Magazine 56, 12 (2018), 97?103. https://doi.org/10.1109/MCOM.2018.1800069 Google ScholarDigital LibraryTom Barbette. 2018. Architecture for programmable network infrastructure. Ph.D. Dissertation. University of Liege. http://www.diva-portal.org/smash/record.jsf?pid=diva2 accessed 2020-12-23.Google ScholarTom Barbette, Marco Chiesa, Gerald Q. Maguire Jr., and Dejan Kosti’c. 2020. Stateless CPU-Aware Datacenter Load-Balancing. Association for Computing Machinery, New York, NY, USA, 548?549. isbn:9781450379489 https://doi.org/10.1145/3386367.3431672 Google ScholarDigital LibraryTom Barbette, Georgios P. Katsikas, Gerald Q. Maguire Jr., and Dejan Kosti’c. 2019. RSS++: Load and State-Aware Receive Side Scaling. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies (Orlando, Florida) (CoNEXT ?19). Association for Computing Machinery, New York, NY, USA, 318?333. isbn:9781450369985 https://doi.org/10.1145/3359989.3365412 Google ScholarDigital LibraryTom Barbette, Cyril Soldani, and Laurent Mathy. 2015. Fast Userspace Packet Processing. In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems (Oakland, California, USA) (ANCS ’15). IEEE Computer Society, Washington, DC, USA, 5?16. isbn:978-1-4673-6632-8 https://doi.org/10.1109/ANCS.2015.7110116 Google ScholarCross RefTom Barbette, Chen Tang, Haoran Yao, Dejan Kosti’c, Gerald Q. Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa. 2020. A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency . In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 667?683. isbn:978-1-939133-13-7 https://www.usenix.org/conference/nsdi20/presentation/barbetteGoogle ScholarDigital LibraryBESS. 2017. sn_buff Layout. https://github.com/NetSys/bess/blob/master/core/snbuf_layout.h.Google ScholarBESS. 2019. Packet. https://github.com/NetSys/bess/blob/master/core/packet.h.Google ScholarAndrea Di Biagio and Matt Davis. 2020. llvm-mca – LLVM Machine Code Analyzer. https://llvm.org/docs/CommandGuide/llvm-mca.html, accessed 2020-06-15.Google ScholarPat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-Independent Packet Processors. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 87?95. issn:0146-4833 https://doi.org/10.1145/2656877.2656890 Google ScholarDigital LibraryAnat Bremler-Barr, Yotam Harchol, and David Hay. 2016. OpenBox: A Software-Defined Framework for Developing, Deploying, and Managing Network Functions. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM ?16). Association for Computing Machinery, New York, NY, USA, 511?524. isbn:9781450341936 https://doi.org/10.1145/2934872.2934875 Google ScholarDigital LibraryCristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI?08). USENIX Association, USA, 209?224.Google ScholarDigital LibraryD. Cerovi?, V. Del Piccolo, A. Amamou, K. Haddadou, and G. Pujolle. 2018. Fast Packet Processing: A Survey. IEEE Communications Surveys Tutorials 20, 4 (2018), 3645?3676. https://doi.org/10.1109/COMST.2018.2851072 Google ScholarDigital LibraryDehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications. In CGO 2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization. New York, NY, USA, 12?23.Google ScholarCharlie Curtsinger and Emery D. Berger. 2013. STABILIZER: Statistically Sound Performance Evaluation. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS ?13). Association for Computing Machinery, New York, NY, USA, 219?228. isbn:9781450318709 https://doi.org/10.1145/2451116.2451141 Google ScholarDigital LibraryBangwen Deng, Wenfei Wu, and Linhai Song. 2020. Redundant Logic Elimination in Network Functions. In Proceedings of the Symposium on SDN Research (San Jose, CA, USA) (SOSR ?20). Association for Computing Machinery, New York, NY, USA, 34?40. isbn:9781450371018 https://doi.org/10.1145/3373360.3380832 Google ScholarDigital LibraryDPDK. 2020. Data Plane Development Kit (DPDK). https://dpdk.org.Google ScholarDPDK. 2020. Mbuf Library. https://doc.dpdk.org/guides/prog_guide/mbuf_lib.html.Google ScholarHaggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 345?362. isbn:978-1-939133-03-8 https://www.usenix.org/conference/atc19/presentation/eranGoogle ScholarEricsson. 2017. Supercharging the Evolved Packet Gateway. Technical Report. Ericsson. https://www.ericsson.com/assets/local/digital-services/doc/Supercharging-the-Evolved-Packet-Gateway.pdf https://www.ericsson.com/assets/local/digital-services/doc/Supercharging-the-Evolved-Packet-Gateway.pdf, accessed 2020-07-24.Google ScholarH. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 365?376. issn:1063-6897Google ScholarAlireza Farshin and Tom Barbette. 2021. PacketMill: Toward per-core 100-Gbps Networking – Artifact for ASPLOS’21. https://doi.org/10.5281/zenodo.4435970 Note that this is just an archive for ASPLOS’21 artifact evaluation; you can access the latest version at https://github.com/aliireza/packetmill. Google ScholarDigital LibraryAlireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kosti’c. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys ’19). ACM, New York, NY, USA, Article 8, 17 pages. isbn:978-1-4503-6281-8 https://doi.org/10.1145/3302424.3303977 Google ScholarDigital LibraryAlireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kosti’c. 2020. Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 673?689. isbn:978-1-939133-14-4 https://www.usenix.org/conference/atc20/presentation/farshinGoogle ScholarFastClick. 2019. Packet Class. https://github.com/tbarbette/fastclick/blob/master/include/click/packet.hh.Google ScholarFD.io. 2017. Vector Packet Processing – One Terabit Software Router on Intel Xeon Scalable Processor Family Server. Technical Report. Cisco, Intel Corporation, FD.io. https://fd.io/docs/whitepapers/FDioVPPwhitepaperJuly2017.pdf https://fd.io/docs/whitepapers/FDioVPPwhitepaperJuly2017.pdf, accessed 2020-07-24.Google ScholarDaniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 51?66. isbn:978-1-939133-01-4 https://www.usenix.org/conference/nsdi18/presentation/firestoneGoogle ScholarDigital LibraryMassimo Gallo and Rafael Laufer. 2018. ClickNF: a Modular Stack for Custom Network Functions. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 745?757. isbn:978-1-939133-01-4 https://www.usenix.org/conference/atc18/presentation/galloGoogle ScholarDigital LibraryGCC. 2009. Link Time Optimization. https://gcc.gnu.org/wiki/LinkTimeOptimization, accessed 2020-06-15.Google ScholarTaras Glek and Jan HubivCka. 2010. Optimizing real world applications with GCC Link Time Optimization. arxiv:1010.2196 [cs.PL] http://sciencewise.info/media/pdf/1010.2196v2.pdf, accessed 2020-06-15.Google ScholarMatt Godbolt. 2020. Optimizations in C++ Compilers. Commun. ACM 63, 2 (Jan. 2020), 41?49. issn:0001-0782 https://doi.org/10.1145/3369754 Google ScholarDigital LibraryGoogle. 2020. GitHub – Propeller: Profile Guided Optimizing Large Scale LLVM-based Relinker. https://github.com/google/llvm-propeller, accessed 2020-06-15.Google ScholarGoogle. 2020. GitHub – Souper: A superoptimizer for LLVM IR. https://github.com/google/souper, accessed 2020-06-15.Google ScholarCorey Gough, Ian Steiner, and Winston A. Saunders. 2015. Energy Efficient Servers: Blueprints for Data Center Optimization (1st ed.). Apress, USA. isbn:1430266376Google ScholarSangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. Berkeley Extensible Software Switch (BESS). http://span.cs.berkeley.edu/bess.html, accessed 2020-07-22.Google ScholarSangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. SoftNIC: A Software NIC to Augment Hardware. Technical Report UCB/EECS-2015-155. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-155.htmlGoogle ScholarSangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-Accelerated Software Router. SIGCOMM Comput. Commun. Rev. 40, 4 (Aug. 2010), 195?206. issn:0146-4833 https://doi.org/10.1145/1851275.1851207 Google ScholarDigital LibraryToke Hoiland-Jorgensen, Jesper Dangaard Brouer, Daniel Borkmann, John Fastabend, Tom Herbert, David Ahern, and David Miller. 2018. The EXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel. In Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies (Heraklion, Greece) (CoNEXT ?18). Association for Computing Machinery, New York, NY, USA, 54?66. isbn:9781450360807 https://doi.org/10.1145/3281411.3281443 Google ScholarDigital LibraryY. Jiang, Y. Cui, W. Wu, Z. Xu, J. Gu, K. K. Ramakrishnan, Y. He, and X. Qian. 2019. SpeedyBox: Low-Latency NFV Service Chains with Cross-NF Runtime Consolidation. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). 68?79. https://doi.org/10.1109/ICDCS.2019.00016 Google ScholarCross RefKostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazi`eres, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for simusecond-scale Tail Latency. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 345?360. isbn:978-1-931971-49-2 https://www.usenix.org/conference/nsdi19/presentation/kaffesGoogle ScholarGeorgios P. Katsikas, Tom Barbette, Marco Chiesa, Dejan Kosti’c, and Gerald Q. Maguire Jr. 2021. What you need to know about (Smart) Network Interface Cards. In Proceedings of the Passive and Active Measurement (PAM) Conference. Springer International Publishing.Google ScholarGeorgios P. Katsikas, Tom Barbette, Dejan Kosti’c, Rebecca Steinert, and Gerald Q. Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 171?186. isbn:978-1-931971-43-0 https://www.usenix.org/conference/nsdi18/presentation/katsikasGoogle ScholarGeorgios P. Katsikas, Marcel Enguehard, Maciej Ku’zniar, Gerald Q. Maguire Jr., and Dejan Kosti’c. 2016. SNF: Synthesizing high performance NFV service chains. PeerJ Computer Science 2, e98. issn:2376-5992 https://doi.org/10.7717/peerj-cs.98 Google ScholarCross RefAntoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (Atlanta, Georgia, USA) (ASPLOS ?16). Association for Computing Machinery, New York, NY, USA, 67?81. isbn:9781450340915 https://doi.org/10.1145/2872362.2872367 Google ScholarDigital LibraryRainer Keller and Shiqing Fan. 2013. PINstruct ? Efficient Memory Access to Data Structures. Springer Berlin Heidelberg, Berlin, Heidelberg, 127?128. isbn:978-3-642-35893-7 https://doi.org/10.1007/978-3-642-35893-7_14 Google ScholarCross RefDonald E. Knuth. 1974. Structured Programming with Go to Statements. ACM Comput. Surv. 6, 4 (Dec. 1974), 261?301. issn:0360-0300 https://doi.org/10.1145/356635.356640 Google ScholarDigital LibraryEddie Kohler, Robert Morris, and Benjie Chen. 2002. Programming Language Optimizations for Modular Router Configurations. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California) (ASPLOS X). Association for Computing Machinery, New York, NY, USA, 251?263. isbn:1581135742 https://doi.org/10.1145/605397.605424 Google ScholarDigital LibraryMaciek Konstantynowicz, Patrick Lu, and Shrikant M. Shah. 2017. Benchmarking and Analysis of Software Data Planes. Technical Report. Cisco, Intel Corporation, FD.io. https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf, accessed 2019-07-24.Google ScholarS. G. Kulkarni, W. Zhang, J. Hwang, S. Rajagopalan, K. K. Ramakrishnan, T. Wood, M. Arumaithurai, and X. Fu. 2020. NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains. IEEE/ACM Transactions on Networking 28, 2 (2020), 639?652. https://doi.org/10.1109/TNET.2020.2969971 Google ScholarDigital LibraryRahman Lavaee, John Criswell, and Chen Ding. 2019. Codestitcher: Inter-Procedural Basic Block Layout Optimization. In Proceedings of the 28th International Conference on Compiler Construction (Washington, DC, USA) (CC 2019). Association for Computing Machinery, New York, NY, USA, 65?75. isbn:9781450362771 https://doi.org/10.1145/3302516.3307358 Google ScholarDigital LibraryBojie Li, Kun Tan, Layong (Larry) Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2016. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM ?16). Association for Computing Machinery, New York, NY, USA, 1?14. isbn:9781450341936 https://doi.org/10.1145/2934872.2934897 Google ScholarDigital LibraryX. Li, X. Wang, F. Liu, and H. Xu. 2018. DHL: Enabling Flexible Software Network Functions with FPGA Acceleration. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). 1?11. https://doi.org/10.1109/ICDCS.2018.00011 Google ScholarCross RefL. Linguaglossa, S. Lange, S. Pontarelli, G. Rétvári, D. Rossi, T. Zinner, R. Bifulco, M. Jarschel, and G. Bianchi. 2019. Survey of Performance Acceleration Techniques for Network Function Virtualization. Proc. IEEE 107, 4 (2019), 746?764. https://doi.org/10.1109/JPROC.2019.2896848 Google ScholarCross RefGuyue Liu, Yuxin Ren, Mykola Yurchenko, K. K. Ramakrishnan, and Timothy Wood. 2018. Microboxes: High Performance NFV with Customizable, Asynchronous TCP Stacks and Dynamic Subscriptions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ?18). Association for Computing Machinery, New York, NY, USA, 504?517. isbn:9781450355674 https://doi.org/10.1145/3230543.3230563 Google ScholarDigital LibraryLLVM. 2018. Four bitcode generated with plugin-opt=save-temps. http://lists.llvm.org/pipermail/llvm-dev/2018-May/123341.html, accessed 2020-06-15.Google ScholarLLVM. 2020. LLVM Link Time Optimization: Design and Implementation. https://llvm.org/docs/LinkTimeOptimization.html, accessed 2020-06-15.Google ScholarLLVM. 2020. ThinLTO. https://clang.llvm.org/docs/ThinLTO.html, accessed 2020-06-15.Google ScholarRoberto Casta neda Lozano, Mats Carlsson, Gabriel Hjort Blindell, and Christian Schulte. 2019. Combinatorial Register Allocation and Instruction Scheduling. ACM Trans. Program. Lang. Syst. 41, 3, Article 17 (July 2019), 53 pages. issn:0164-0925 https://doi.org/10.1145/3332373 Google ScholarDigital LibraryChi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (Chicago, IL, USA) (PLDI ’05). Association for Computing Machinery, New York, NY, USA, 190?200. isbn:1595930566 https://doi.org/10.1145/1065010.1065034 Google ScholarDigital LibraryJoao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio Honda, Roberto Bifulco, and Felipe Huici. 2014. ClickOS and the Art of Network Function Virtualization. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 459?473. isbn:978-1-931971-09-6 https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/martinsGoogle ScholarDigital LibraryHenry Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems (Palo Alto, California, USA) (ASPLOS II). IEEE Computer Society Press, Washington, DC, USA, 122?126. isbn:0818608056 https://doi.org/10.1145/36206.36194 Google ScholarDigital LibraryNiall McDonnell and Gage Eads. 2020. Queue Management and Load Balancing on Intel Architecture. https://tinyurl.com/yxv9cgpj, accessed 2020-08-08.Google ScholarL’aszl’o Moln’ar, Gergely Pongr’acz, G’abor Enyedi, Zolt’an Lajos Kis, Levente Csikor, Ferenc Juh’asz, Attila KHorösi, and G’abor R’etv’ari. 2016. Dataplane Specialization for High-Performance OpenFlow Software Switching. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM ?16). Association for Computing Machinery, New York, NY, USA, 539?552. isbn:9781450341936 https://doi.org/10.1145/2934872.2934887 Google ScholarDigital LibraryRobert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. 1999. The Click Modular Router. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles (Charleston, South Carolina, USA) (SOSP ?99). Association for Computing Machinery, New York, NY, USA, 217?231. isbn:1581131402 https://doi.org/10.1145/319151.319166 Google ScholarDigital LibraryTodd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing Wrong Data without Doing Anything Obviously Wrong!. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (Washington, DC, USA) (ASPLOS XIV). Association for Computing Machinery, New York, NY, USA, 265?276. isbn:9781605584065 https://doi.org/10.1145/1508244.1508275 Google ScholarDigital LibraryRolf Neugebauer, Gianni Antichi, Jos’e Fernando Zazo, Yury Audzevich, Sergio L’opez-Buedo, and Andrew W. Moore. 2018. Understanding PCIe Performance for End Host Networking. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ’18). ACM, New York, NY, USA, 327?341. isbn:978-1-4503-5567-4 https://doi.org/10.1145/3230543.3230560 Google ScholarDigital LibraryAndy Newell and Sergey Pupyrev. 2020. Improved Basic Block Reordering. IEEE Trans. Comput. (2020), 1?1. issn:2326-3814 https://doi.org/10.1109/tc.2020.2982888 Google ScholarCross RefG. S. Niemiec, L. M. S. Batista, A. E. Schaeffer-Filho, and G. L. Nazar. 2020. A Survey on FPGA Support for the Feasible Execution of Virtualized Network Functions. IEEE Communications Surveys Tutorials 22, 1 (2020), 504?525. https://doi.org/10.1109/COMST.2019.2943690 Google ScholarDigital Libraryntop. 2020. PF_RING ZC (Zero Copy). https://www.ntop.org/products/packet-capture/pf_ring/pf_ring-zc-zero-copy/, accessed 2020-08-02.Google ScholarG. Ottoni and B. Maher. 2017. Optimizing function placement for large-scale data-center applications. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 233?244. https://doi.org/10.1109/CGO.2017.7863743 Google ScholarCross RefStack Overflow. 2008. Why doesn’t GCC optimize structs? https://stackoverflow.com/questions/118068/why-doesnt-gcc-optimize-structs, accessed 2020-06-15.Google ScholarStack Overflow. 2012. Why can’t C compilers rearrange struct members to eliminate alignment padding? https://tinyurl.com/yxncnqk8, accessed 2020-08-07.Google ScholarStack Overflow. 2016. Struct Reordering by compiler. https://stackoverflow.com/questions/38244689/struct-reordering-by-compiler, accessed 2020-06-15.Google ScholarMaksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA) (CGO 2019). IEEE Press, 2?14. isbn:9781728114361Google ScholarDigital LibraryAurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 203?216. isbn:978-1-931971-33-1 https://www.usenix.org/conference/osdi16/technical-sessions/presentation/pandaGoogle ScholarDigital LibraryM. Paolino, N. Nikolaev, J. Fanguede, and D. Raho. 2015. SnabbSwitch user space virtual switch benchmark and performance optimization for NFV. In 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN). 86?92. https://doi.org/10.1109/NFV-SDN.2015.7387411 Google ScholarCross RefLuis Pedrosa, Rishabh Iyer, Arseniy Zaostrovnykh, Jonas Fietz, and Katerina Argyraki. 2018. Automated Synthesis of Adversarial Workloads for Network Functions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ?18). Association for Computing Machinery, New York, NY, USA, 372?385. isbn:9781450355674 https://doi.org/10.1145/3230543.3230573 Google ScholarDigital LibraryBen Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, and Martin Casado. 2015. The Design and Implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 117?130. isbn:978-1-931971-218 https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/pfaffGoogle ScholarDigital LibrarySolal Pirelli and George Candea. 2020. A Simpler and Faster NIC Driver Model for Network Functions. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 225?241. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/pirelliGoogle ScholarSekhar Reddy. 2014. What is SKB in Linux kernel? What are SKB operations? Memory Representation of SKB? How to send packet out using skb operations? http://amsekharkernel.blogspot.com/2014/08/what-is-skb-in-linux-kernel-what-are.html.Google ScholarThe Rust Language Reference. 2008. Struct Types. https://github.com/rust-lang/reference/blob/master/src/types/struct.md, accessed 2020-06-15.Google ScholarLuigi Rizzo. 2012. netmap: A Novel Framework for Fast Packet I/O. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 101?112. isbn:978-931971-93-5 https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzoGoogle ScholarDigital LibraryRaimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja, and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR abs/1711.04422 (2017). arxiv:1711.04422 http://arxiv.org/abs/1711.04422Google ScholarVyas Sekar, Norbert Egi, Sylvia Ratnasamy, Michael K. Reiter, and Guangyu Shi. 2012. Design and Implementation of a Consolidated Middlebox Architecture. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX, San Jose, CA, 323?336. isbn:978-931971-92-8 https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/sekarGoogle ScholarChen Sun, Jun Bi, Zhilong Zheng, Heng Yu, and Hongxin Hu. 2017. NFP: Enabling Network Function Parallelism in NFV. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (Los Angeles, CA, USA) (SIGCOMM ?17). Association for Computing Machinery, New York, NY, USA, 43?56. isbn:9781450346535 https://doi.org/10.1145/3098822.3098826 Google ScholarDigital LibraryW. Sun and R. Ricci. 2013. Fast and flexible: Parallel packet processing with GPUs and click. In Architectures for Networking and Communications Systems. 25?35. https://doi.org/10.1109/ANCS.2013.6665173 Google ScholarCross RefVaibhav Sundriyal, Masha Sosonkina, Bryce M. Westheimer, and Mark Gordon. 2018. Comparisons of Core and Uncore Frequency Scaling Modes in Quantum Chemistry Application GAMESS. In Proceedings of the High Performance Computing Symposium (Baltimore, Maryland) (HPC ?18). Society for Computer Simulation International, San Diego, CA, USA, Article 13, 11 pages. isbn:9781510860162Google ScholarDigital LibraryShelby Thomas, Rob McGuinness, Geoffrey M. Voelker, and George Porter. 2018. Dark Packets and the End of Network Scaling. In Proceedings of the 2018 Symposium on Architectures for Networking and Communications Systems (Ithaca, New York) (ANCS ’18). ACM, New York, NY, USA, 1?14. isbn:978-1-4503-5902-3 https://doi.org/10.1145/3230718.3230727 Google ScholarDigital LibraryShelby Thomas, Geoffrey M. Voelker, and George Porter. 2018. CacheCloud: Towards Speed-of-light Datacenter Communication. In 10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotcloud18/presentation/thomasGoogle ScholarGeorgii Tkachuk, Maciek Konstantynowicz, and Shrikant M. Shah. 2019. Benchmarking Software Data Planes – Intel Xeon Skylake vs. Broadwell. Technical Report. Cisco, Intel Corporation, FD.io. https://www.lfnetworking.org/wp-content/uploads/sites/55/2019/03/benchmarking_sw_data_planes_skx_bdx_mar07_2019.pdf https://www.lfnetworking.org/wp-content/uploads/sites/55/2019/03/benchmarking_sw_data_planes_skx_bdx_mar07_2019.pdf, accessed 2020-07-24.Google ScholarYuta Tokusashi, Huynh Tu Dang, Fernando Pedone, Robert Soul’e, and Noa Zilberman. 2019. The Case For In-Network Computing On Demand. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys ’19). ACM, New York, NY, USA, Article 21, 16 pages. isbn:978-1-4503-6281-8 https://doi.org/10.1145/3302424.3303979 Google ScholarDigital LibraryTom Barbette. 2020. Network Performance Framework (NPF). https://github.com/tbarbette/npf, accessed 2020-07-24.Google ScholarAmin Tootoonchian, Aurojit Panda, Chang Lan, Melvin Walls, Katerina Argyraki, Sylvia Ratnasamy, and Scott Shenker. 2018. ResQ: Enabling SLOs in Network Function Virtualization. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 283?297. isbn:978-1-939133-01-4 https://www.usenix.org/conference/nsdi18/presentation/tootoonchianGoogle ScholarDigital LibraryM. Trevisan, A. Finamore, M. Mellia, M. Munafo, and D. Rossi. 2017. Traffic Analysis with Off-the-Shelf Hardware: Challenges and Lessons Learned. IEEE Communications Magazine 55, 3 (2017), 163?169. https://doi.org/10.1109/MCOM.2017.1600756CM Google ScholarDigital LibraryGiorgos Vasiliadis, Lazaros Koromilas, Michalis Polychronakis, and Sotiris Ioannidis. 2014. GASPP: A GPU-Accelerated Stateful Packet Processing Framework. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association, Philadelphia, PA, 321?332. isbn:978-1-931971-10-2 https://www.usenix.org/conference/atc14/technical-sessions/presentation/vasiliadisGoogle ScholarJames M. Westall. 2011. Management of sk_buffs. https://people.cs.clemson.edu/~westall/853/notes/skbuff.pdf.Google ScholarXiaodong Yi, Jingpu Duan, and Chuan Wu. 2017. GPUNFV: A GPU-Accelerated NFV System. In Proceedings of the First Asia-Pacific Workshop on Networking (Hong Kong, China) (APNet?17). Association for Computing Machinery, New York, NY, USA, 85?91. isbn:9781450352444 https://doi.org/10.1145/3106989.3106990 Google ScholarDigital LibraryArseniy Zaostrovnykh, Solal Pirelli, Rishabh Iyer, Matteo Rizzo, Luis Pedrosa, Katerina Argyraki, and George Candea. 2019. Verifying Software Network Functions with No Verification Expertise. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP ?19). Association for Computing Machinery, New York, NY, USA, 275?290. isbn:9781450368735 https://doi.org/10.1145/3341301.3359647 Google ScholarDigital LibraryArseniy Zaostrovnykh, Solal Pirelli, Luis Pedrosa, Katerina Argyraki, and George Candea. 2017. A Formally Verified NAT. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (Los Angeles, CA, USA) (SIGCOMM ?17). Association for Computing Machinery, New York, NY, USA, 141?154. isbn:9781450346535 https://doi.org/10.1145/3098822.3098833 Google ScholarDigital LibraryKai Zhang, Bingsheng He, Jiayu Hu, Zeke Wang, Bei Hua, Jiayi Meng, and Lishan Yang. 2018. G-NET: Effective GPU Sharing in NFV Systems. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 187?200. isbn:978-1-939133-01-4 https://www.usenix.org/conference/nsdi18/presentation/zhang-kaiGoogle ScholarTianzhu Zhang, Leonardo Linguaglossa, Massimo Gallo, Paolo Giaccone, Luigi Iannone, and James Roberts. 2019. Comparing the Performance of State-of-the-Art Software Switches for NFV. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies (Orlando, Florida) (CoNEXT ’19). Association for Computing Machinery, New York, NY, USA, 68?81. isbn:9781450369985 https://doi.org/10.1145/3359989.3365415 Google ScholarDigital LibraryYang Zhang, Bilal Anwer, Vijay Gopalakrishnan, Bo Han, Joshua Reich, Aman Shaikh, and Zhi-Li Zhang. 2017. ParaBox: Exploiting Parallelism for Virtual Network Functions in Service Chaining. In Proceedings of the Symposium on SDN Research (Santa Clara, CA, USA) (SOSR ?17). Association for Computing Machinery, New York, NY, USA, 143?149. isbn:9781450349475 https://doi.org/10.1145/3050220.3050236 Google ScholarDigital LibraryZhipeng Zhao, Hugo Sadok, Nirav Atre, James C. Hoe, Vyas Sekar, and Justine Sherry. 2020. Achieving 100Gbps Intrusion Prevention on a Single Server. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 1083?1100. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/zhao-zhipengGoogle ScholarDigital LibraryN. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore. 2014. NetFPGA SUME: Toward 100 Gbps as Research Commodity. IEEE Micro 34, 5 (2014), 32?41. https://doi.org/10.1109/MM.2014.61 Google ScholarCross Ref
Index Terms
PacketMill: toward per-Core 100-Gbps networking
Recommendations
A flexible and efficient container-based NFV platform for middlebox networking
SAC ’18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
Network Function Virtualization (NFV) enables multiple network functions (NFs) to operate simultaneously on a commodity server. Internet Data Centers (IDCs) gain significant flexibility and agility through NFV’s ability to dynamically deploy and …
Designing Virtual Network Functions for 100 GbE Network Using Multicore Processors
ANCS ’17: Proceedings of the Symposium on Architectures for Networking and Communications Systems
Network function virtualization (NFV) introduces great flexibility in designing software-based network appliances to reduce cost and accelerate service deployment for network operators. However, with the fast development of high speed network of 100 GbE …
Low latency network traffic processing with commodity hardware
Spects ’15: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems
Packet processing on commodity hardware is a cost-efficient and flexible alternative to specialized networking hardware. In case of Linux, the classical QoS mechanisms (e.g DiffServ) assume that the outgoing link is the bottleneck. However, on commodity …
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
InformationContributors
Published in
ASPLOS ’21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Publisher
Association for Computing Machinery
New York, NY, United States
Check for updates
Qualifiers
research-article
Conference
Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
PDF Format
View or Download as a PDF file.
eReader
View online with eReader.
eReader
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://dl.acm.org/doi/abs/10.1145/3445814.3446724