Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects - PDF

Please download to get full document.

View again

of 103
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report

Entertainment & Media


Views: 8 | Pages: 103

Extension: PDF | Download: 0

Related documents
Slides are available from Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects Tutorial at
Slides are available from Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects Tutorial at HotI 15 by Dhabaleswar K. (DK) Panda The Ohio State University Xiaoyi Lu The Ohio State University Introduction to Big Data Applications and Analytics Big Data has become the one of the most important elements of business analytics Provides groundbreaking opportunities for enterprise information management and decision making The amount of data is exploding; companies are capturing and digitizing more information than ever The rate of information growth appears to be exceeding Moore s Law 2 4V Characteristics of Big Data Commonly accepted 3V s of Big Data Volume, Velocity, Variety Michael Stonebraker: Big Data Means at Least Three Different Things, 4/5V s of Big Data 3V + *Veracity, *Value Courtesy: JTEz0cfJjGurJucBMTkIUNdL3jcZT8IPfNWfN9/dv1.jpg 3 Big Volume of Data by the End of this Decade From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes. By 2020, a third of the data in the digital universe (more than 13,000 exabytes) will have Big Data Value, but only if it is tagged and analyzed. Courtesy: John Gantz and David Reinsel, The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East, IDC's Digital Universe Study, sponsored by EMC, December Data Generation in Internet Services and Applications Webpages (content, graph) Clicks (ad, page, social) Users (OpenID, FB Connect, etc.) s (Hotmail, Y!Mail, Gmail, etc.) Photos, Movies (Flickr, YouTube, Video, etc.) Cookies / tracking info (see Ghostery) Installed apps (Android market, App Store, etc.) Location (Latitude, Loopt, Foursquared, Google Now, etc.) User generated content (Wikipedia & co, etc.) Ads (display, text, DoubleClick, Yahoo, etc.) Comments (Discuss, Facebook, etc.) Reviews (Yelp, Y!Local, etc.) Third party features (e.g. Experian) Social connections (LinkedIn, Facebook, etc.) Purchase decisions (Netflix, Amazon, etc.) Instant Messages (YIM, Skype, Gtalk, etc.) Search terms (Google, Bing, etc.) Timestamp (everything) News articles (BBC, NYTimes, Y!News, etc.) Blog posts (Tumblr, Wordpress, etc.) Microblogs (Twitter, Jaiku, Meme, etc.) Link sharing (Facebook, Delicious, Buzz, etc.) Network traffic Number of Apps in the Apple App Store, Android Market, Blackberry, and Windows Phone (2013) Android Market: 1200K Apple App Store: ~1000K Courtesy: 5 Velocity of Big Data How Much Data Is Generated Every Minute on the Internet? The global Internet population grew 14.3% from 2011 to 2013 and now represents 2.4 Billion People. Courtesy: 6 Not Only in Internet Services - Big Data in Scientific Domains Scientific Data Management, Analysis, and Visualization Applications examples Climate modeling Combustion Fusion Astrophysics Bioinformatics Data Intensive Tasks Runs large-scale simulations on supercomputers Dump data on parallel storage systems Collect experimental / observational data Move experimental / observational data to analysis sites Visual analytics help understand data visually 7 Typical Solutions or Architectures for Big Data Analytics Hadoop: The most popular framework for Big Data Analytics HDFS, MapReduce, HBase, RPC, Hive, Pig, ZooKeeper, Mahout, etc. Spark: Provides primitives for in-memory cluster computing; Jobs can load data into memory and query it repeatedly Storm: A distributed real-time computation system for real-time analytics, online machine learning, continuous computation, etc. S4: A distributed system for processing continuous unbounded streams of data GraphLab: Consists of a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API. Web 2.0: RDBMS + Memcached ( Memcached: A high-performance, distributed memory object caching systems 8 Data Management and Processing on Modern Clusters Substantial impact on designing and utilizing modern data management and processing systems in multiple tiers Front-end data accessing and serving (Online) Memcached + DB (e.g. MySQL), HBase Back-end data analytics (Offline) HDFS, MapReduce, Spark 9 Who Are Using Hadoop? Focuses on large data and data analysis Hadoop (e.g. HDFS, MapReduce, RPC, HBase) environment is gaining a lot of momentum 10 Presentation Outline Overview MapReduce and RDD Programming Models Apache Hadoop, Spark, and Memcached Modern Interconnects and Protocols Challenges in Accelerating Hadoop, Spark, and Memcached Benchmarks and Applications using Hadoop, Spark, and Memcached Acceleration Case Studies and In-Depth Performance Evaluation The High-Performance Big Data (HiBD) Project and Associated Releases Ongoing/Future Activities for Accelerating Big Data Applications Conclusion and Q&A 11 The MapReduce Model J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6 th Symposium on Operating Systems Design & Implementation (OSDI 04), WordCount Execution The overall execution process of WordCount in MapReduce 13 A Hadoop MapReduce Example - WordCount public class WordCount { public static class Map extends Mapper LongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(longwritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.tostring(); LOC of the Full StringTokenizer tokenizer = new StringTokenizer(line); } while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); } context.write(word, one); } public static class Reduce extends Reducer Text, IntWritable, Text, IntWritable { } } public void reduce(text key, Iterator IntWritable values, Context context) throws IOException, InterruptedException { } int sum = 0; Scalable Productive Example: 63 while (values.hasnext()) { sum +=; } context.write(key, new IntWritable(sum)); Fault- Tolerant 14 Data Sharing Problems in MapReduce HDFS read HDFS write HDFS read HDFS write iter. 1 iter Input Slow due to replication, serialization, and disk IO In-Memory? iter. 1 iter Input faster than network and disk 15 RDD Programming Model in Spark Key idea: Resilient Distributed Datasets (RDDs) Immutable distributed collections of objects that can be cached in memory across cluster nodes Created by transforming data in stable storage using data flow operators (map, filter, groupby, ) Manipulated through various parallel operators Automatically rebuilt on failure rebuilt if a partition is lost Interface Clean language-integrated API in Scala (Python & Java) Can be used interactively from Scala console 16 RDD Operations Transformations (define a new RDD) map filter sample union groupbykey reducebykey sortbykey join Actions (return a result to driver) reduce collect count first Take countbykey saveastextfile saveassequencefile More Information: 17 Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns Base lines = spark.textfile( hdfs://... ) Transformed RDD RDD results errors = lines.filter(_.startswith( ERROR )) messages = \t )(2)) tasks Driver cachedmsgs = messages.cache() Worker Block 1 Cache 1 cachedmsgs.filter(_.contains( foo )).count cachedmsgs.filter(_.contains( bar )).count... Result: full-text Result: scaled search to of 1 Wikipedia TB data in in 5-7 1 sec (vs 20 (vs 170 sec for sec on-disk for on-disk data) data) Worker Block 3 Action Cache 3 Worker Block 2 Cache 2 Courtesy: 18 Lineage-based Fault Tolerance RDDs maintain lineage information that can be used to reconstruct lost partitions Example cachedmsgs = textfile(...).filter(_.contains( error )).map(_.split( \t )(2)).cache() HdfsRDD path: hdfs:// FilteredRDD func: contains(...) MappedRDD func: split( ) CachedRDD 19 RDD Example: Word Count in Spark! val file = spark.textfile( hdfs://... ) Productive 3 Lines! val counts = file.flatmap(line = line.split( )).map(word = (word, 1)).reduceByKey(_ + _) counts.saveastextfile( hdfs://... ) Scalable High- Performance Fault- Tolerant 20 Benefits of RDD Model Consistency is easy due to immutability Inexpensive fault tolerance (log lineage rather than replicating/checkpointing data) Locality-aware scheduling of tasks on partitions Despite being restricted, model seems applicable to a broad variety of applications Easy Programming High-Performance Scalable Logistic Regression Performance Apache Hadoop (127s/iteration) Apache Spark (first iteration : 174s, further iterations: 6s) Running Time (s) Hadoop Spark Courtesy: Number of Iterations 21 Presentation Outline Overview MapReduce and RDD Programming Models Apache Hadoop, Spark, and Memcached Modern Interconnects and Protocols Challenges in Accelerating Hadoop, Spark, and Memcached Benchmarks and Applications using Hadoop, Spark, and Memcached Acceleration Case Studies and In-Depth Performance Evaluation The High-Performance Big Data (HiBD) Project and Associated Releases Ongoing/Future Activities for Accelerating Big Data Applications Conclusion and Q&A 22 Architecture Overview of Hadoop, Spark, and Memcached Overview of Apache Hadoop Architecture and its Components MapReduce HDFS RPC Spark HBase Overview of Web 2.0 Architecture and Memcached 23 Overview of Apache Hadoop Architecture Open-source implementation of Google MapReduce, GFS, and BigTable for Big Data Analytics Hadoop Common Utilities (RPC, etc.), HDFS, MapReduce, YARN Hadoop 1.x Hadoop 2.x MapReduce (Data Processing) Other Models (Data Processing) MapReduce (Cluster Resource Management & Data Processing) YARN (Cluster Resource Management & Job Scheduling) Hadoop Distributed File System (HDFS) Hadoop Distributed File System (HDFS) Hadoop Common/Core (RPC,..) Hadoop Common/Core (RPC,..) 24 Projects Under Apache Hadoop Ambari: A web-based tool for provisioning, managing, and monitoring Hadoop clusters. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad-hoc querying. Mahout: A scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. Spark: A fast and general compute engine for data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation. Tez: A generalized data-flow programming framework, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted to replace Hadoop MapReduce. ZooKeeper: A high-performance coordination service for distributed applications. 25 Big Data Processing with Hadoop Components Major components included in this tutorial: MapReduce (Batch) HBase (Query) HDFS (Storage) RPC (Inter-process communication) Underlying Hadoop Distributed File System (HDFS) used by both MapReduce and HBase Model scales but high amount of communication during intermediate phases can be further optimized User Applications MapReduce HBase HDFS Hadoop Common (RPC) Hadoop Framework 26 Typical Hadoop 1.x Cluster MapReduce Framework JobTracker: track, manage jobs and detect failure TaskTrackers: host map/reduce tasks computation HDFS NameNode: stores meta information of data blocks DataNodes: store blocks and support operations on blocks Hadoop Cluster 27 Hadoop 1.x MapReduce Job Execution Main Features DataNode (TaskTracker) map reduce map Replication (e.g. 3) Data locality for Maps reduce HTTP-based Shuffle Client Job Completion NameNode (JobTracker) DataNode (TaskTracker) map reduce Speculative execution Independence among tasks map reduce DataNode (TaskTracker) DataNode (TaskTracker) Goals Fault Tolerance Scalability 28 MapReduce on Hadoop 1.x The clients submit MapReduce jobs to JobTracker The JobTracker assigns Map and Reduce tasks to other nodes in the cluster These nodes each run a daemon TaskTracker on separate JVM Each TaskTracker initiates the Map or Reduce tasks and reports progress back to JobTracker Courtesy: 29 MapReduce on Hadoop 2.x -- YARN Architecture Resource Manager: coordinates the allocation of compute resources Node Manager: in charge of resource containers, monitoring resource usage, and reporting to Resource Manager Application Master: in charge of the life cycle an application, like a MapReduce Job. It negotiates with Resource Manager of cluster resources and keeps track of task progress and status Courtesy: 30 Data Movement in Hadoop MapReduce Bulk Data Transfer Disk Operations Map and Reduce Tasks carry out the total job execution Map tasks read from HDFS, operate on it, and write the intermediate data to local disk Reduce tasks get these data by shuffle from TaskTrackers, operate on it and write to HDFS Communication in shuffle phase uses HTTP over Java Sockets 31 Hadoop Distributed File System (HDFS) Primary storage of Hadoop; highly reliable and fault-tolerant Adopted by many reputed organizations Client eg: Facebook, Yahoo! NameNode: stores the file system namespace DataNode: stores data blocks Developed in Java for platformindependence and portability Uses sockets for communication! RPC RPC RPC RPC (HDFS Architecture) 32 Network-Level Interaction Between Clients and Data Nodes in HDFS 33 Hadoop RPC Usage RPC in HDFS Report the current load and all the information of stored blocks between DNs and NN The HDFS clients use RPC to communicate with NN RPC in MapReduce Exchanges information between JT and TTs, e.g. heartbeat messages, task completion events, error reports, etc. Submit a Job for execution and get the current system status RPC in HBase A core communication scheme for HBase Put/Get operations 34 Spark Architecture Overview An in-memory data-processing framework Iterative machine learning jobs Interactive data analytics Scala based Implementation Standalone, YARN, Mesos Scalable and communication intensive Wide dependencies between Resilient Distributed Datasets (RDDs) MapReduce-like shuffle operations to repartition RDDs Sockets based communication 35 Spark Ecosystem Generalize MapReduce to support new apps in same engine Two Key Observations General task support with DAG Multi-stage and interactive apps require faster data sharing across parallel jobs BlinkDB Spark SQL Spark Streaming (real-time) GraphX (graph) MLbase (machine learning) Spark Standalone Apache Mesos YARN 36 HBase Overview Apache Hadoop Database ( Semi-structured database, which is highly scalable Integral part of many datacenter applications eg: Facebook Social Inbox Developed in Java for platformindependence and portability (HBase Architecture) Uses sockets for communication! 37 Network-Level Interaction Between HBase Clients, Region Servers and Data Nodes 38 Overview of Web 2.0 Architecture and Memcached Three-layer architecture of Web 2.0 Web Servers, Memcached Servers, Database Servers Memcached is a core component of Web 2.0 architecture Internet 39 Memcached Architecture Distributed Caching Layer Allows to aggregate spare memory from multiple nodes General purpose Typically used to cache database queries, results of API calls Scalable model, but typical usage very network intensive 40 Presentation Outline Overview MapReduce and RDD Programming Models Apache Hadoop, Spark, and Memcached Modern Interconnects and Protocols Challenges in Accelerating Hadoop, Spark, and Memcached Benchmarks and Applications using Hadoop, Spark, and Memcached Acceleration Case Studies and In-Depth Performance Evaluation The High-Performance Big Data (HiBD) Project and Associated Releases Ongoing/Future Activities for Accelerating Big Data Applications Conclusion and Q&A 41 Trends for Commodity Computing Clusters in the Top 500 List ( Number of Clusters Percentage of Clusters Number of Clusters 87% Percentage of Clusters Timeline 42 Overview of High Performance Interconnects High-Performance Computing (HPC) has adopted advanced interconnects and protocols InfiniBand 10 Gigabit Ethernet/iWARP RDMA over Converged Enhanced Ethernet (RoCE) Very Good Performance Low latency (few micro seconds) High Bandwidth (100 Gb/s with dual FDR InfiniBand) Low CPU overhead (5-10%) OpenFabrics software stack with IB, iwarp and RoCE interfaces are driving HPC systems Many such systems in Top500 list 43 All interconnects and protocols in OpenFabrics Stack Application / Middleware Interface Application / Middleware Sockets Verbs Protocol Kernel Space TCP/IP TCP/IP RSockets SDP TCP/IP RDMA RDMA Ethernet Driver IPoIB Hardware Offload User Space RDMA User Space User Space User Space Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/10/40/100 GigE IPoIB 10/40 GigE- TOE RSockets SDP iwarp RoCE IB Native 44 Trends of Networking Technologies in TOP500 Systems Percentage share of InfiniBand is steadily increasing Interconnect Family Systems Share Courtesy: ernet-will-have-to-work-harder-to-win-hpc/ 45 Large-scale InfiniBand Installations 259 IB Clusters (51%) in the June 2015 Top500 list ( Installations in the Top 50 (24 systems): 519,640 cores (Stampede) at TACC (8 th ) 76,032 cores (Tsubame 2.5) at Japan/GSIC (22 nd ) 185,344 cores (Pleiades) at NASA/Ames (11 th ) 194,616 cores (Cascade) at PNNL (25 th ) 72,800 cores Cray CS-Storm in US (13 th ) 76,032 cores (Makman-2) at Saudi Aramco (28 th ) 72,800 cores Cray CS-Storm in US (14 th ) 110,400 cores (Pangea) in France (29 th ) 265,440 cores SGI ICE at Tulip Trading Australia (15 th ) 37,120 cores (Lomonosov-2) at Russia/MSU (31 st ) 124,200 cores (Topaz) SGI ICE at ERDC DSRC in US (16 th ) 57,600 cores (SwiftLucy) in US (33 rd ) 72,000 cores (HPC2) in Italy (17 th ) 50,544 cores (Occigen) at France/GENCI-CINES (36 th ) 115,668 cores (Thunder) at AFRL/USA (19 th ) 76,896 cores (Salomon) SGI ICE in Czech Republic (40 th ) 147,456 cores (SuperMUC) in Germany (20 th ) 73,584 cores (Spirit) at AFRL/USA (42 nd ) 86,016 cores (SuperMUC Phase 2) in Germany (21 st ) and many more! 46 Open Standard InfiniBand Networking Technology Introduced in Oct 2000 High Performance Data Transfer Interprocessor communication and I/O Low latency ( 1.0 microsec), High bandwidth (up to 12.5 GigaBytes/sec - 100Gbps), and low CPU utilization (5-10%) Flexibility for LAN and WAN communication Multiple Transport Services Reliable Connection (RC), Unreliable Connection (UC), Reliable Datagram (RD), Unreliable Datagram (UD), and Raw Datagram Provides flexibility to develop upper layers Multiple Operations Send/Recv RDMA Read/Write Atomic Operations (very unique) high performance and scalable implementations of distributed locks, semaphores, collective communication operations Leading to big changes in designing HPC clusters, file systems, cloud computing systems, grid computing systems,. 47 Communication in the Channel Semantics (Send/Receive Model) Memory Segment Memory Segment Memory Segment CQ Memory QP Send Recv Processor Processor Processor is involved only to: QP Send Recv Memory Memory Segment Memory Segment CQ 1. Post receive WQE 2. Post send WQE 3. Pull out completed CQEs from the CQ InfiniBand Device Send WQE contains information about the send bu
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks