Confluences among Big Data, Finite Element Analysis and High Performance Computing - PDF

Please download to get full document.

View again

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Design

Published:

Views: 3 | Pages: 8

Extension: PDF | Download: 0

Share
Related documents
Description
American Journal of Engineering and Applied Sciences Original Review Paper Confluences among Big Data, Finite Element Analysis and High Performance Computing 1 Lidong Wang, 2 Guanghui Wang and 3 Cheryl
Transcript
American Journal of Engineering and Applied Sciences Original Review Paper Confluences among Big Data, Finite Element Analysis and High Performance Computing 1 Lidong Wang, 2 Guanghui Wang and 3 Cheryl Ann Alexander 1 Department of Engineering Technology, Mississippi Valley State University, USA 2 State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, China 3 Technology and Healthcare Solutions, Inc., USA Article history Received: Revised: Accepted: Corresponding Author: Lidong Wang Department of Engineering Technology, Mississippi Valley State University, USA Abstract: Big Data analyzes correlations from huge raw data and predicts outcomes. It has great impacts on scientific discoveries and value creation. High Performance Computing (HPC) uses parallel processing and advanced programs or software packages to complete complicated jobs quickly. Finite Element Method (FEM) is very powerful in scientific computation and engineering analysis. It has created huge values in almost every area of engineering. In a lot of applications, Finite Element Analysis (FEA) strongly relies on advanced computer technology and HPC. Big Data will play an important role in FEA and HPC. This paper presents confluences among Big Data, FEA and HPC. Keywords: Big Data, Finite Element Method (FEM), High Performance Computing (HPC), Big Data Analytics, Hadoop, MapReduce, Graphical Processing Unit (GPU) Introduction Scientific data is often on a massive scale with complexity and heterogeneity. It is often manipulated through complex and distributed workflows, applicationspecific (ad hoc) using low-level code libraries. Big Data technology has been expected to perform scalable query processing and scientific workflow management for scientific data (Pacitti and Valduriez, 2012). Big data is a massive volume of both structured and unstructured data. It is so large that it is difficult to process using traditional database and software techniques (Demchenko et al., 2013). Big data is often heterogeneous. Each organization tends to produce and manage its own data in specific formats and with its own processes. Big data is complicated. Its complexity lies in: Uncertain data (because of data capture), multiscale data (with lots of dimensions) and graph-based data, etc. Continuous data streams are captured (e.g., from sensors or mobile devices), which produces streaming and dynamically changing big data (Pacitti and Valduriez, 2012). Big Data is likely to be advantageous for comparing differences in competing design options. The combination of Big Data, Artificial Intelligence (AI) and massively parallel computing offered the potential to create a revolutionary way of practicing evidence-based and personalized medicine (Dilsizian and Siegel, 2014). Big data privacy is a sensitive issue with conceptual, legal and technological implications. Storage and I/O optimization for big-data computing is also an important issue. There is tremendous wealth of information in big data. The information is potentially valuable. High Performance Computing (HPC) can help unlock the wealth contained in big data (Jean-François Lavignon, 2013). HPC offers immense potential for data-intensive computing. But as data explodes in volume, variety and velocity; it is getting increasingly difficult to scale compute performance (Intel, 2014). Data-intensive HPC, massive storage and file system, I/O Architecture and low-power computing and automatic cloud provisioning for HPC are interesting topics in HPC. Data movement is very expensive. Reducing data movement is criticalfor HPC. Data locality should be the best solution. Finite Element Method (FEM) has been widely used in engineering. Traditionally, finite element simulations can be performed on various computers. Advanced numerical methods (e.g., multiscale computation with multiscale material models, or finite element computation with adaptive mesh refinement) can generate data with large volumes and rates. Largescale simulation workflows can run on large 2015 Lidong Wang, Guanghui Wang and Cheryl Ann Alexander. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license. Lidong Wang et al. / American Journal of Engineering and Applied Sciences 2015, ( ):. DOI: /ajeassp supercomputers and data are dumped on parallel disk systems (Parashar, 2014). Data were organized internally within the Finite Element Analysis (FEA) core based on an objectoriented model. Data were represented in three basic data types: Matrix, Vector and ID (integer). For external data representation, extensible Markup Language (XML) was used as the standard for representing data in a platform independent manner (Peng et al., 2003). However, ASCII/XML don t adapt well for highly voluminous and complex data such as large-scale finite element analysis data and heterogeneous product data. XML doesn t express entity relationships well either (Folk, 2006). People have a growing interest in the integration of Big Data and computational mechanics such as FEA. There are algorithmic challenges of big data in hugescale finite element computation. For example, the data is distributed in space; the requested parts of data are not always available (Nesterov, 2014). The development of finite element meshing and numerical optimization algorithms, parallel scientific computing in the HPC environment and the combination of Big Data and FEA help improve the accuracy and efficiency of FEA greatly. Characteristics, Implementation and Technology of Big Data Big data can be from various sources with low information density and with different structures (structured data; semi-structured document; and unstructured text, graph, image and video). It is often hard to integrate, verify and assess big data (Davenport, 2014). Big data characteristics can be described by 6Vs. They are: Volume, Variety, Value, Velocity, Veracity and Variability (Bellini et al., 2013; Demchenko et al., 2013; Jean-François Lavignon, 2013; O'Leary, 2013; Jagadish et al., 2014): Volume: It refers to massive amounts of data. This makes it hard to store and manage Variety: It refers to heterogeneity of data types, representation and semantic interpretation. This makes it hard to perform data integration Value: The collected data can bring added-value to the intended process, activity, or predictive analysis Velocity: Data such as highly streaming data is generated at a rate that exceeds those of traditional systems, which makes it hard to perform online processing Variability: It refers to data changes during processing and lifecycle. Big data can be constantly changing (dynamic). Thus, analysis often needs to be able to run in real time. There are challenges in dealing with highly varying data Veracity: It refers to the accuracy, truthfulness and reliability of the data. Veracity makes it hard to perform data analysis. Big data is often noisy (uncertain). Dealing with noisy big data and quantifying data uncertainty have become imperative. These rely on computationally intensive statistical and machine-learning techniques Capgemini Consulting conducted a global survey of senior Big Data executives in November The survey covered 226 respondents across Europe, North America andasia Pacific; spanned a number of industries including retail, manufacturing, pharmaceuticals, financial services and energy and utilities. The survey targeted senior executives across the Analytics, IT and Business functions because they were responsible for overseeing Big Data initiatives in their organization. Only 27% of the executives in the survey described their Big Data initiatives as successful. Table 1 reveals main challenges that organizations face in Big Data implementation (Colas et al., 2014). Big Data for development is an issue of turning imperfect, complex, often unstructured data into actionable information. This implies leveraging advanced computational methods (such as machine learning) to reveal correlations and trends within and across large data sets. The intensive mining of socioeconomic data, known as reality mining, can shed light on processes and interactions. Reality mining can be done in three main ways (Letouzé, 2012): Continuous data analysis over streaming data: Using tools to scrape the Web to monitor and analyze high-frequency online data streams including uncertain and inexact data Online digestion of semi-structured data and unstructured ones such as news items and product reviews to shed light on hot topics, perceptions, needs and wants Real-time correlation of streaming data (fast stream) with slowly accessible historical data repositories Big data processing is the foundation in applications. In order to improve data processing capability, the Hadoop framework is used to achieve the distribution storage and analysis work of the collected big data (Yan et al., 2014). Hadoop is a Java based framework and heterogeneous open source platform. Hadoop s primary modules are the Hadoop Distributed File System (HDFS) and MapReduce (MR). HDFS provides high throughput access to big data. MR implements a high level and implicit parallel programming model (Davenport, 2014). Hadoop can offer a number of techniques and tools in Table 2 (Eaton et al., 2012; Schneider, 2012; Davenport, 2014; Raghupathi and Raghupathi, 2014): Lidong Wang et al. / American Journal of Engineering and Applied Sciences 2015, ( ):. DOI: /ajeassp Table 1. Key challenges for Big Data implementation Challenges Percentage (%) Scattered data lying in silos across various teams 46 Absence of a clear business case for funding and implementation 39 Ineffective coordination of Big Data and analytics teams in the organization 35 Dependency on legacy systems for data processing and management 31 Ineffective governance models for Big Data and analytics 27 Lack of sponsorship from top management 27 Lack of Big Data and analytics skills 25 Lack of clarity on Big Data technology and tools 22 Cost of specific tools and infrastructure for Big Data and analytics 18 Data security and privacy concerns 15 Resistance to change within the organization 12 Table 2. Hadoop techniques, tools and their functions Techniques and tools Functions HDFS A highly fault tolerant distributed file system; responsible for storing data. MapReduce A powerful parallel programming technique for distributed processing. Pig A scripting language for describing operations like reading, filtering, transforming, joining and writing data. Hive A scripting language similar to Pig, but more batch oriented; being able to transform data into the relational format for structured query language (SQL) queries. HBase A scalable and distributed database for random read/write access. Oozie A workflow for dependent Hadoop jobs. ZooKeeper A centralized service (coordination) for providing distributed synchronization and group services. Sqoop A project (data exchange) for transferring data between relational databases and Hadoop. Flume Log collector; performing reliable and distributed streaming log collection. Avro A system of data serialization. Chukwa A Hadoop subproject as data accumulation system for monitoring distributed systems. Big Data can extend applications by the following: (1) Robust and highly distributed file systems capable of managing Big Data applications; (2) flexible methods for handling large quantities of data in highly parallel computing environments such as MapReduce, a parallel merge-sort algorithm; (3) Structured Query Language (SQL) and Not only SQL (NoSQL) capabilities for retrieving and storing data from dynamically growing databases or data streams (Douglas, 2014). Features, Functions and Challenges of High Performance Computing High Performance Computing (HPC) uses parallel processing for running advanced application programsquickly and reliably. High performance computer clusters can complete different kinds of jobs. For example, optimizing a parallel genetic algorithm was implemented on a HPC architecture using clusters of computers with several Graphical Processing Units (GPUs). Clusters can be high performance computing clusters, load balancing clusters and fail-over clusters. Linux and Microsoft Azure are common operating systems used for HPC. Lustre is a POSIX-compliant and parallel system used in HPC. It is currently an opensource system that is often used in conjunction with Hadoop to provide a distributed, scalable and high performance file system in place of HDFS. It enables HPC engines to maintain performance by aggregating multiple I/O paths to multiple servers in the compute cluster (Guillén et al., 2014; Slack, 2014). There are the following challenges (Parashar, 2014) faced by traditional HPC data pipelines: Scalable data analytics challenge I/O and storage challenge Increasing performance gap Disks are outpaced by computing speed Data movement challenge There is a lot of data movement between simulation and analysis as well as between coupled mutliphysics simulation components, which leads to long latencies. Energy challenge Much power is consumed due to the memory and data movement. According to IBM data-centric design principles, data motion should be minimized and workflow parallelism should be increased to leverage low-power cores. Computation and analytics should be moved closer to the data. It is the best for work to be done where the data resides. It is also important to integrate in-situ analytics and in-transit analytics. Primary resources execute the Lidong Wang et al. / American Journal of Engineering and Applied Sciences 2015, ( ):. DOI: /ajeassp main simulation and in-situ computations; secondary resources provide a staging area whose cores act as containers for in-transit computations (Parashar, 2014). Confluence between Big Data and Finite Element Analysis Traditionally, Finite Element Analysis (FEA) is often conducted using only one geometric configuration, one set of boundary conditions and one set of material properties or parameters. Results only for every n-th time step (n is normally greater than 2) of analysis are stored. The computed results are often only for a set of points in the geometric design. Only part of variables (such as minimal and maximal stresses/strains) are quantitatively analyzed. In a lot of applications, there are changing geometric shapes with service time due to loads; changing material parameters and the stress-strain relationship of materials under various temperatures; and changing boundary conditions. The FEA issue should be very complicated if the combination of the above changing aspects occurs. Statistical approaches or stochastic approaches to handling changing geometric configurations, changing boundary conditions and changing material properties are expected to get better solutions to the numerical analysis of a true physics system behavior and performance. There are analysis results with the level of Terabytes (TB) in some applications of large FEA. Big Data (with statistical methods and stochastic methods) can help FEA handle large size results and deal with highly varying conditions and properties. It is also powerful in resolving problems in large data storage, analysis and visualization. The confluence between Big Data and FEA can provide more comprehension analysis, predict defects and failures and optimize design in time. A generalized workflow was outlined for the integration of multi-modal measurements and multiphysics models at multiple hierarchical length scales to accelerate materials development. Protocols for direct and efficient linking of materials models/databases into process/performance simulation codes (e.g., crystal plasticity finite element method) were studied. A Big Data based workflow was given for integrating mesoscale heterogeneities in material structure with process simulation. The efforts based on Big Data were for improving the accuracy of predictive modeling (Salem et al., 2014). There will be more applications of FEA enhanced by Big Data. The following can be the challenges or driving forces: (1) FEA model sizes are increasing continuously; (2) analysis types such as nonlinear and dynamics are becoming more common; (3) software efficiency and hardware performance are increasing; (4) one of the limits to realistic model size is the ability to store and retrieve the vast amount of data generated in FEA (Abbey, 2014); (5) there are problems with sparse data in large-scale FEA and sometimes the data is changing with time; (6) hierarchical meshes through adaptive mesh refinement are often needed in huge-scale FEA for fatigue crack incubation/crack growth and fracture issues in Aeronautical and Astronautical engineering; (7) multiscale materials models (macro, micro and nano) and multiscale computation (macro FEA, crystal plasticity FEA and molecular dynamics) can provide more accurate results in some applications. Confluence between Big Data and High Performance Computing Traditionally, HPC was chiefly compute-intensive and not data-intensive. There have been new HPC usage models that deal with HPC workloads in cloud computing, increasing heterogeneity of data and explosion of data volumes ( big data ). Big Data Analytics (BDA) techniques and tools can handle large and diverse semi-structured and unstructured datasets, which increases the overhead of data access and must be handled in parallelized methods. Big data is by nature distributed. Therefore, distributed algorithms are key; data migration is necessary and important. The computational complexity of analyzing big data and its sheer size indicate that data-intensive computing is very important (Jean-François Lavignon, 2013). The comparison between Big Data analytics (BDA) and High Performance Computing (HPC) is shown in Table 3 (Valduriez, 2015). The combination of HPC and BDA is sometimes called Data-intensive HPC. Table 3. Big Data analytics (BDA) Vs. High Performance Computing (HPC) BDA HPC Computing model Data-centric: move tasks to data and Compute-centric: move data to tasks and reduce accelerate Data storage Uniform storage (sharding) on disks Hierarchical storage (disks, tapes, etc.) Parallel file management Designed for few big files, e.g., Designed for many small files, e.g., Lustre Hadoop Distributed File System (HDFS) Programming model Algebraic operators, e.g., Map Reduce, Message Passing Interface (MPI) Spark versus Open Multi Processing (OpenMP) Languages Java, Python, C++ C, C++ Lidong Wang et al. / American Journal of Engineering and Applied Sciences 2015, ( ):. DOI: /ajeassp As Big Data frameworks find their way into production environments, users are facing challenges of integrating scale performance, scaling and stabilizing the performance of the clusters. The growing ubiquity of Solid State Disks (SSDs), high performance CPUs and advanced network clustering enables users to build efficient clusters, scale performance and provide a bestin-class environment for Hadoop and other Big Data applications (Gutkind, 2013). One of challenges of Big Data is to access data in an efficient way, applying massive parallelism not only for the computation, but also for the storage. Although HPC clusters are typically equipped with high-level interconnections, the real problems arise when it is necessary to transfer data between geographically distributed sites because the Internet connection might not suitable to transfer big data. Therefore, reducing data movement or data locality is critical. In cluster computing, the data parallel approach subdivides the data to analyze among almost independent processes. It can be a suitable solution for Big Data analysis. In GPU computing, GPUs deliver extremely high floating-point performance and massively parallelism at a very low cost (Merelli et al., 2014). High Performance Data Analytics (HPDA) represents the confluence of HPC and Big Data analytics that is deployed onto HPC-style configurations. For i
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks