Getting Started with Big Data Analytics for the Enterprise - PDF

Please download to get full document.

View again

of 29
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Internet & Technology

Published:

Views: 5 | Pages: 29

Extension: PDF | Download: 0

Share
Related documents
Description
Getting Started with Big Data Analytics for the Enterprise Mike Biere IBM Tuesday August 7 th Session Number : IBM s Big Data Portfolio IBM views Big Data at the enterprise level thus we aren t honing
Transcript
Getting Started with Big Data Analytics for the Enterprise Mike Biere IBM Tuesday August 7 th Session Number : 11930 IBM s Big Data Portfolio IBM views Big Data at the enterprise level thus we aren t honing in on one aspect such as analysis of social media or federated data. Our solutions span 4 key areas: 1. Data Warehouse (Information Server, DB2 Analytics Accelerator, Neteeza, etc.) 2. BigInsights (Hadoop etc.) 3. Stream data capture and analysis 4. Federated data discovery and analysis 2 What can you do with big data? Act on Deeper Customer Insight Social media customer sentiment analysis Promotion optimization Segmentation Customer profitability Click-stream analysis CDR processing Multi-channel interaction analysis Loyalty program analytics Churn prediction Create Innovative New Products Social Media - Product/brand Sentiment analysis Brand strategy Market analysis RFID tracking & analysis Transaction analysis to create insight-based product/service offerings Optimize your Operational Processes Smart Grid/meter management Distribution load forecasting Sales reporting Inventory & merchandising optimization Options trading ICU patient monitoring Disease surveillance Transportation network optimization Store performance 3 Environmental analysis Experimental research Proactively Maintain your Assets Network analytics Asset management and predictive issue resolution Website analytics IT log analysis Prevent Fraud and Reduce Risk Multimodal surveillance Cyber security Fraud modeling & detection Risk modeling & management Regulatory reporting Pains Addressed by a Big Data Platform High cost of storing and analyzing data combined with data growing volumes Cost and performance of enterprise data warehouse - single DW cannot meet everyone s needs Inability to exploit new sources of data need to explore, prove value, and extract it cost effectively Loss of fidelity and huge time/cost to convert unstructured data (video, audio, textual content) to structured format for analysis Inability to act and high cost of acting on data in real-time leads to lost opportunities High cost to maintain data online when it could exist in an online archive query-able archive 4 IBM Big Data Strategy: move the analytics closer to the data New analytic applications drive the requirements for a big data platform BI / Reporting Analytic Applications Exploration / Visualization Functional App Industry App Predictive Analytics Content BI / Analytics Reporting Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for adhoc analysis Development environment for building new analytic applications Workload optimization and scheduling Security and Governance Visualization & Discovery Hadoop System IBM Big Data Platform Application Development Accelerators Stream Computing Systems Management Data Warehouse Information Integration & Governance Big Data Platform a consultant s view Diagnose Prescribe Platform Determine Pain and Leading Product Expand Use Case for BD product x-sell 1. Understand their data assets 2. Current use of their data assets and planned future use 3. Understand new sources and combinations of data that has business impact 4. Message Big Data Pain Points and Use Cases 1. Message IBM thought leadership on leveraging their existing data assets and the BD platform 2. Identify the granular pain point 3. Determine the business case for Big data 4. Determine Lead product 5. Further qualify that product 1. Determine the use case 2. Position the big data platform (complimentary products for the use case) 3. Determine cross-sell potential for BD products Big Data Platform Initial OI at platform level Determine Pain and Lead Product Expand Use Case for specific BD product Data Warehouse opportunity Data Warehouse Use Case BD Platform X-Sell Big Data Platform opportunity BigInsights opportunity Big Insights Use Case BD Platform X-Sell Streams opportunity Streams Use Case BD Platform & X-Sell Federated Discovery & Navigation OI FDN Use Case BD Platform & X-Sell 7 Big Data Platform self assessment Questions Pain Lead Product Do you have performance challenges with your DW? High number of concurrent users/queries? Do you expect your query/user volume to grow? Is the volume in your DW increasing (TBs and PB)? Do you want to analyze both structured and unstructured data together, without converging them to one schema? Are there any projects where you do not analyze the full volume of data available to you? Why not? Are you concerned with the cost of managing growing data volumes in traditional technology? Do you have the need to analyze data in real-time? Would you like to analyze a body of data that is simply too large to persist in any technology? Too much latency for user queries to DW Volume of structured information is growing and straining performance Inability to analyze a variety of data in its native format Persisting and analyzing all available data results in poor performance or huge costs Inability to analyze data in motion resulting in too much latency in insight Too costly to store and analyze all available data Data Warehousing InfoSphere BigInsights InfoSphere Streams Big Data Platform Data Warehouse Data Warehouse InfoSphere Information Server InfoSphere Warehouse IDAA Netezza Big Data Platform 9 InfoSphere Information Server 10 Information Server and Foundation Tools for System z Windows z/os DS, QS ADMIN, BG, Reports Metadata Linux for System z Workbench IA (also Wintel, Unix & distributed Linux) Web Console Meta Brokers Reports repository Roles-based GUI Design Tools work the way you do Oracle, SQL Server, Sybase, metadata IA/QS/WISD metadata DataStage Server Logging Reporting IA Client Scheduling Authorization Analysis Metadata ISD WISD Client Access Connector Access Services Information Service Framework WebSphere SFTP, DRDA or Batch Pipes Common reusable services frameworkclassic leverages Federation the power of a Server SOA environment WAS App Server (In the box) Changed SEQ. IMS VSAM IDMS Adabas Datacom Data Event Publisher or CDC Change Data DSEngine Capture MQ Shared Data meta data repository promotes reuse, compliance, visual Change Data Capture WebSphere for incremental Distributed Connectivity includes: z/os Connectivity lineage updating Optional & DB2 impact from: Parallel Engine Application includes: z/os analysis DB2 for Linux Unix Server Windows Linux for Metadata DB2 on Linux, Unix, Windows DB2zStage: DB2 for for z Linux Operational on z Environment DB2 for z/os Informix Non-z/OS Dynamic Server z/os -- DB2 for Robust, z/os DB2parallel z/os processing with Connectors Connectors Oracle (In the box) MS SQL Server -- Parallel minimal read TCP/IP andimpact loadbetween through ondb2 z/os Connect zlinux costs & z/os Oracle SQL Server -- Parallel load through Batch Pipes Sybase Informix Classic Connect Can SuSe for read/write leverage & Red to QSAM, Hat hipersocket VSAM, 11Application packs Sybase IAM, IMS, CA-IDMS, Qualifies CA-Datacom for ziip andoffload Software AG ADABAS VSAM, IMS, CA-IDMS and Software AG ADABAS for z/os InfoSphere Warehouse for DB2 for z/os MQT Advisor Design Studio Admin Console Windows / Linux Eclipse IE/Firefox JDBC/DB2 Connect Cognos 8 BI for System z MDX Excel Third Parties / BPs SQL Client Layer Design and admin client BI / Reporting tools and apps WebSphere App Server Cubing Services Engine SQW Runtime Application Server Linux on System Z Partition / IFL JDBC/DB2 Connect DB2 for z/os MQT Cube Metadata Control DB Data Warehouse Server DB2 for z/os IMS VSAM RDBMS Source Systems IBM Confidential IBM DB2 Analytics Accelerator V2 product components CLIENT zenterprise Netezza Technology Data Studio Foundation DB2 Analytics Accelerator Admin Plug-in OSA-Express3 Network Primary BladeCenter 10 GbE 10Gb Backup Users/ Applications Data Warehouse application DB2 for z/os enabled for IBM DB2 Analytics Accelerator IBM DB2 Analytics Acelerator 13 Performance & Savings Total Rows Reviewed DB2 with IDAA DB2 Only Total Rows Returned Hours Sec(s) Hours Sec(s) Times Faster Query Query 1 2,813, ,320 2:39 9, ,908 Query 2 2,813, ,780 2:16 8, ,644 Query 3 8,260, :16 4, Query 4 2,813, ,197 1:08 4, Query 5 3,422, :57 4, Query 6 4,290, :53 3, Query 7 361,521 58,236 0:51 3, Query 8 3, :44 2, ,320 Query 9 4,130, :42 2, Queries run faster Save CPU resources People time Business opportunities Actual customer results, October 2011 DB2 Analytics Accelerator: we had this up and running in days with queries that ran over 1000 times faster DB2 Analytics Accelerator: we expect ROI in less than 4 months 14 Accelerating decisions to the speed of business Big Data Platform- BigInsights Big Data Platform Unstructured Social Media Etc. IBM BigInsights Hadoop Cloudera etc/. 15 Financial Industry Regulatory client Greenplum Backdrop 20b records/day processed 6tb/day Data growing 80%/year Client refuses to see their costs grow 80% year Goal: introduce a Hadoop store, to reduce costs NAS ORCL Netezza Hadoop?? Plan The short term examine BigInsights into their environment, while allowing them to easily leverage current frontends (Cognos, apps, custom web interfaces, etc) They are interested in 3 accelerators: text, stats, predictive Long term: they will move data segments out of traditional warehouse/databases and into Hadoop 16 Profile of a BigInsights solution Very large data sets (TBs to PBs) Schema-less data in native format Low user concurrency Open source, non-proprietary solution Support for non SQL development tools (MapReduce, R) Need to explore data with questions you can t anticipate Analytics across and match unstructured and non-standard data types Store data once but look at in multiple ways i.e. multiple data structures Desire to analyze data in place, without moving it or loading it Analytical sandbox to explore data, outside the organization s official restricted-access data management platforms Large data archive that you want available for occasional query and reporting access, but which is not valuable enough to host in a warehouse Big Data Platform- Streams Big Data Platform Unstructured Social Media Etc. IBM InfoSphere Streams 18 IBM InfoSphere Streams v2.0 A platform for real-time analytics on BIG data Volume Terabytes per second Petabytes per day Variety All kinds of data All kinds of analytics Velocity Insights in microseconds Agility Dynamically responsive Rapid application development ICU Monitoring Powerful Analytics Algo Trading Cyber Security Millions of events per second Real time decisions Environment Monitoring Smart Grid Government / Law enforcement Traditional / Non-traditional data sources Telco churn predict Microsecond Latency Why InfoSphere Streams? Applications that require on-the-fly processing, filtering and analysis of streaming data Sensors: environmental, industrial, surveillance video, GPS, Data exhaust : network/system/web server/app server log files High-rate transaction data: financial transactions, call detail records Criteria: two or more of the following Messages are processed in isolation or in limited data windows Sources include non-traditional data (spatial, imagery, text, ) Sources vary in connection methods, data rates, and processing requirements, presenting integration challenges Data rates/volumes require the resources of multiple processing nodes Analysis and response are needed with sub-millisecond latency Data rates and volumes are too great for store-and-mine approaches Streams usage case Data in motion, streaming data The value shelf life of the data is narrow Volume + Speed + Analytic Requirement = Performance Challenge Structured, unstructured or non conventional data types Bring analytics to data, not data to the analytics Real time scoring using predictive models or rules based engine Need to examine and respond to information in real time Can t ingest, examine and respond to the new high speed, high volume data sources hitting my existing DM and DW solutions React sooner to reduce risk, detect and prevent fraud, or prevent dangers (national security, power plant) Notice events sooner to capture sales opportunities, connect with in market customers, or notice patterns that matter to my business. Respond to new information in flight, in real time - before it lands Telephony Architecture Data Preprocessing Combined churn prediction KPI Monitoring Churn Prediction Model (call pattern based) Churn and Value Prediction Model (social network analysis based) Graph Edges and Nodes Graph Construction Joint Churn Prediction Summary Statistic Extraction Telecom Data Real-time Summary Statistics SNAzzy Model - Social Network Analysis - Customer value extraction Call Detail Records Preprocessed CDRs Predictive Churn Model - Complex Decision Tree - Calling patterns/ user contracts Streams for Real-Time Geomapping Multiple GPS Data Sources K probe points per second per source Map probe point to nearest polyline (Map) 200 million 1 billion poly-lines 2 level grid decomposition based search GPS Data Sources 14 Blade servers 2X Dual-Core Xeon GB RAM 4 data prep, 10 mapping servers Performance 941,000 probes/sec for 1 Billion poly-lines Hierarchical Mapping Real-time location profile Big Data Platform- Federated discovery Big Data Platform 24 Federated discovery and navigation IBM Vivisimo Identifying a Vivisimo opportunity what a consultant looks for Top Use Cases 1. Understanding and viewing/navigating big data sources before importing data to a Hadoop system 2. Navigate, discover, and view big data sources to understand their potential value Searching and navigating federated big data sources for operational applications 25 Customer service Sales / Product recommendation Order management Qualifying Questions How many sources of big data do you have in your business? Do you understand the potential value of those big data sources today? Do you have a need to discover / preview those data sources before importing/analyzing them? Are you wondering how to get started with big data and you re unsure which data to import into a Hadoop system? Do you have operational processes that require people to search multiple repositories of varied content? How much time is wasted with those manual processes? Vivisimo overview value proposition & differentiators Value Proposition Accuracy more relevant results due to position-based indexing Security respects the security rights of underlying systems Scalability scales to trillions of records Differentiators Unique Federated Discovery and Navigation Technology Position-based vs. vector-based index Clustering and faceting to navigate data results Scalable Architecture Fully distributed, fault-tolerant, unlimited scalability Advanced On-the-Fly Analytics State-of-the-art real-time text and meta-data analytics Secure Connectivity Secure data integration of multiple repositories in complex IT environments Powerful Development Tools Easy-to-deploy applications across varied and large data sets & sources Fast Time to Value 26 Rapid deployments from POCs to production Vivisimo Product Overview IBM s Big Data platform will support open source distributions supporting multiple client bases InfoSphere BigInsights Analytic Applications BI / Exploration / Functional Industry Predictive Content BI / Reporting Visualization App App Analytics Analytics Report ing Advanced Engines IBM Big Data Platform Visualization & Discovery Indexing Enterprise capabilities Application Development Systems Management Accelerators Connectors Hadoop System Workload Optimization Stream Computing Data Warehouse Administration & Security Information Integration & Governance Open source based components IBM certified Apache Hadoop or or Other open source distributions (future) Hadoop (file system) Map Reduce (parallel processing) Hbase (database) Oozie (workflow) Zookeeper (distributed coordination) Use Case at a Current IBM Opportunity Improve customer satisfaction and lower costs Problem Gain 360 customer view of customer to offer optimal and relevant services customized to the customer needs Information locked into multiple data sources (ERP, Teradata, Mainframes, Social Media Content, Call Center Apps, Homegrown Apps, etc.) in the form of both structured and unstructured data) Solution faceted exploration to explore all repositories, extract and index all metadata Extract relevancy and support insurance industry specific ontology for text analytics Provide connectivity to complex internal repositories such as ERP, mainframe systems, as well as external data, such as web, , feeds, web 2.0 social media content Usage and applicability of Vivisimo Vivisimo can provide single point access to all data sources without copying the data into Hadoop Ability to combine disparate data sources increases employee productivity and lowers costs BigInsights can be used to run deep analytic jobs and Vivisimo can provide adhoc answers 28 Multiple front-end applications leveraging common back-end infrastructure High availability and manageability for long analytic computing threads BigInsights for Batch Processing Hadoop Investigative analytic applications leverage Vivisimo velocity platform suited for navigation Vivisimo for Adhoc analysis Vivisimo Connectors crawl, index and load Unstructured Data ( , Calls center logs, etc) Content Mgmt Systems Teradata Warehouse Mainframes Customer profiles (demographics, relevancy amongst family members and friends, product sentiments, needs) Customer call details Client exchanges Lessons learned ERP, CRM Summary Where are you in the range of Big data solutions requirements? Data Warehouse not in order? start here Big Data team/project office/competency center in place? No better get a plan in place Data for analysis at all levels identified? No an absolute necessity!! Enterprise level approach or fit for purpose approach? FFP better discuss how these may need to dovetail later and you ll be starting over. Know what others in your industry are doing? No? those who hesitate can be outperformed 2-3x Identify and remember the implications of social media upon your business. It isn t about kids tweeting anymore but you need to separate the noise from the real information. 29
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x