BigDataInterviewQuestions.txt | Apache Hadoop | Data Model

Please download to get full document.

View again

of 2
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Documents

Published:

Views: 10 | Pages: 2

Extension: TXT | Download: 0

Share
Related documents
Description
Structured Data: Data that resides in a fixed field within a record or file is called structured data\ Semi structured Data: it a form of structured data but not conform with the formal structure of data m odels associates with the relational database or other form of data tables, but nonetheless contains tags or other markers to separate semantic elements an d enforce hierarchies of records and fields within the data unstructured Data: refers to information that either does not have a pre-def
Transcript
  Structured Data:Data that resides in a fixed field within a record or file is called structured data\Semi structured Data:it a form of structured data but not conform with the formal structure of data models associates with the relational database or other form of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the dataunstructured Data:refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. BigData:Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate.Big data describes a massive volume of structured and unstructured data that is so large that it's difficult to process using traditional database techniques.hadoop:hadoop is a framework that allows for distributed processing of large data sets across cluster of commodity hardware using simple programming model.HDFS:HDFS in file sysytem desgined for storing a large data with streaming data access pattern, running on commodity hardware.The term secondary name-node is somewhat misleading. It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure.The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary name-node periodically downloads current name-node image and edits log files, joins them into new image and uploads the new image back to the (primary and the only) name-node. See User Guide.So if the name-node fails and you can restart it on the same physical node then there is no need to shutdown data-nodes, just the name-node need to be restarted. If you cannot use the old node anymore you will need to copy the latest image somewhere else. The latest image can be found either on the node that used to be the primary before failure if available; or on the secondary name-node. The latter will be the latest checkpoint without subsequent edits logs, that is the most recent name space modifications may be missing there.  You will also need to restart the whole cluster in this case.The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapreduce.tasktracker.reduce.tasks.maximum).With 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks