Tuesday, September 28, 2010
 
Who’s who
All three tools’s queries or sentences, from Hive, HBase & Pig, they compile to MapReduce collection jobs, they handle their design and weaving as well as their results and internalization, they expect to Save You Time, plenty of headaches, as avoiding code-scattering, highly coupled functions, that you can step into if you have to design complex chaining jobs, sharing states of information, or your own database tool, by only writing Java jobs implementing MapReduce contracts.
Instead, you can use them for the different purposes they were designed for.
Pig is recomended for chewing and manipulating some semi-structured data, also for discovering, parsing, etc. It also has some primitive collection datatypes, and Zebra works integrated with him as a simple database facility, to avoid unnecesary custom data persistance mechanisms.
For Hive and HBase you’ll need the data well structured and clean, do not even think about using them with raw data.

This’s one possible scenario, where varying several sources of data, in different formats or maturity levels, may convey into the hadoop distributed filesystem, where is it chewed by custom jobs or scripts. Then, it may be used as structured information, for later higher level purposes.

(Click to enlarge)



So, you’ll have to balance depending of your requirements, between faster but too simple logical tasks in MapReduce (chaining them will be a pronostic), and less faster but easier evolving design with clean code using Hive Queries or HBase methods. Sometimes you wont have to decide, write Java code in an MR approach may be a nightmare, you may end up writing too many simple routines hiding the real business purpose of their existance.
Still, although Pig is one of the most used methods for processing data, MapReduce is also commonly used for it’s simplicity, performance, and because it’s the base of a programming paradigm to sinthetize vast data into the information we need to get. Sometimes too, huge sizes of data are required to process for very simple tasks, and you wont need to prepare for a database design or strategy using Hive or HBase, etc.

A Markitecture
I want to show here a MapReduce-HDFS projects centered architecture platform. I left ZooKeeper out of this picture cause it’s not integrated to MapReduce nor HDFS, but only used by Hbase, the same with Avro that’s rather a backend library used by the core.

A Resume how they all interact, which languages they use, how the platform deploys their processes
(Click to enlarge)



I’ll be posting individual diagrams for them all.

 

205 comentarios:

«Oldest   ‹Older   201 – 205 of 205   Newer›   Newest»
Mrbk30 said...

Very Informative blog thank you for sharing. Keep sharing.

Best software training institute in Chennai. Make your career development the best by learning software courses.

uipath course in chennai
devops training in chennai
cloud computing training in chennai

pranisha said...

Grab the Oracle Training in Chennai from Infycle Technologies the best software training and placement center in Chennai which is providing technical software courses such as Data Science, Artificial Intelligence, Cyber Security, Big Data, Java, Hadoop, Selenium, Android, and iOS Development, DevOps, etc with 100% hands-on practical training.

Pavithra Devi said...

This post is so interactive and informative.keep update more information...
dot net training in anna nagar
Dot net training in Chennai


Muskan said...

Your blog provides a practical perspective on when to use each tool based on data requirements and complexity. It also emphasizes the importance of balancing the need for faster but simpler logical tasks using MapReduce and the advantages of easier evolving design with clean code using Hive Queries or HBase methods.
Techniques for Evaluating the User Experience of Software Applications

Ruhi said...

Thanks for sharing this kind of information. this blog was very helpful for me and others. I always curious to know more about how we can enhance our skill.you can also visit: EFFECTIVE STRATEGIES FOR SOFTWARE QUALITY ASSURANCE

«Oldest ‹Older   201 – 205 of 205   Newer› Newest»