the Hadooper in me: Let's understand Hadoop at first look

Tuesday, November 2, 2010

Let's understand Hadoop at first look

4:16 AM | Publicado por Cristian |

For the newcomers, I'll provide here a simple approach to understand Hadoop's design and main goals.

Compare: one of the first mistaken attitudes !
A typical transaction system lives inside the range of some gigabytes data, in random access. It runs inside an application server, acceses and takes control over a relational database allocated in a different server, transporting data in and out of it.
Interacts with online clients, keeping a reduced size of data in transport, with a shared and limited bandwith, the operations are most of all continuous reading over small sets of data, combined with some maintenance & CRUD operations.
Bigger data processing is done in batch, but the architecture is the same.

But a different scenario ?
What happens if we need to process 1 petabyte/week ?, let's say a much lesser size, 1 terabyte/day, less, 100 gigabytes/day
Under the traditional schema, it would be like moving an elephant over a string thousands of times, the time-windows would not allow to work with realtime information, but always trying to keep up and losing the race.

Want problems ?
The network would saturate, batch processes would take days to finish information of hours, the harddisk latency access it will transform into an incremental overhead, having a painful impact on overall costs.
The traditional approach would mantain the architecture, but changing hardware, sophisticated requirements, each time more expensive, bigger, but over a limited growth architecture. How many times will you change your system's architecture to non-linear scalar solutions to keep up with the race of growing data, ?

The solution is in the focus !
Problem 1: Size of data
systems handling public data, can have huge processing flows with hundreds of terabytes, public websites have had increased their data up to petabytes !
Strategy 1: Distribute and paralellize the processing
If chewing 200 Tb of data takes 11 days in 1 computer, let divide the whole process into as many machines needed to finish the task in minutes or hours !
If we'll have lot of computers, so be they cheap and easily mantained !, into a simple add & replace node architecture.

Problem 2: Continuous failure and cluster growth
In a big cluster, lot of machines fail everyday, besides the cluster size cannot be fixed and sometimes cannot be planned, it should grow easily at the speed of data.
Strategy 2: High fault tolerance and high scalability
Any design must have an excelent fault-tolerance mechanism, a very flexible maintenance, and its service availability should keep up naturally to its cluster size.
So we need a distributed storage, atomic, transactional, with perfect linear scalability.

Problem 3: Exponential overhead in data transport
From data source to its processing location, bad harddisk latency and saturated networks will end up exploting
Strategy 3: Move application logic to data storage
It must allow to handle data and processes separately and in a transparent way, applications should take care of their own business application code, and the platform should automatically deploy it into the datastorage, execute and handle their results,

So...

Distribute and Paralellize processing
High fault-tolerance mechanism, High scalability
Take application processes to data storage

One Solution: Hadoop Distributed Filesystem + Hadoop MapReduce

Anyway, Hadoop is not bullet-proof, actually, it's a very specific solution for some big-data scenarios, with needs of realtime processing, and technology choices for random access to data. (See HBase, Hive)

Hadoop is not designed to replace RDBMS, but instead, has been proven to handle -in a much performant way- huge amounts of data, whereas traditional enterprise database clusters, wouldn't work not even close at the same overall costs !

24 comentarios:

Unknown said...: Really good piece of knowledge, I had come back to understand regarding your website from my friend Sumit, Hyderabad And it is very useful for who is looking for HADOOP.I browse and saw you website and I found it very interesting.Thank you for the good work, greetings.
Hadoop Training in hyderabad; July 20, 2014 at 2:35 AM
Stephanie Clark said...: your blog is very helpful for visitors and tourists thanks.

Database design Tanzania; March 7, 2017 at 4:38 AM
srikanth said...: Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! Hadoop Training; May 19, 2017 at 7:07 AM
హాట్‌గర్ల్స్ said...: It's Really A Great Post.
Best IT Training in Bangalore; November 14, 2017 at 7:21 AM
Unknown said...: Hi ,your post on hadoop was simple and as beginner i got an idea what hadoop would be meant in concepts Hadoop Training in Velachery | Hadoop Training .; May 4, 2018 at 8:48 AM
amar said...: Iot Training in Bangalore
Artificial Intelligence Training in Bangalore
Machine Learning Training in Bangalore
Blockchain Training bangalore
Data Science Training in Bangalore
Big Data and Hadoop Training in bangalore
Devops Training in Bangalore; May 7, 2018 at 7:54 AM
Technogeekscs said...: I like your all blog so much. You share very useful information for Big Data and Hadoop. Thanks a lot...!

Big Data Training in Pune
Big Data Certification in Pune
Big Data Testing Classes; September 18, 2018 at 7:50 AM
Unknown said...: Hi admin, I have read your post. It was really awesome post. Keep it up... Customer Reconciliation | CFA Audit | CA Firms; October 17, 2018 at 9:59 AM
asha said...: Big data and Hadoop post was really very nice
best training institute for hadoop in Bangalore

best big data hadoop training in Bangalroe

hadoop training in bangalore

hadoop training institutes in bangalore

hadoop course in bangalore; November 26, 2018 at 4:02 AM
Unknown said...: An extremely pleasant guide. I will take after these tips. Much obliged to you for sharing such definite article..... top ca firms in chennai
CFA Audit
Stock Audit; November 27, 2018 at 9:36 AM
Dharani M said...: More informative and impressive blog
best training institute for hadoop in Marathahalli

best big data hadoop training in Marathahalli

hadoop training in Marathahalli

hadoop training institutes in Marathahalli

hadoop course in Marathahalli; November 28, 2018 at 1:39 AM
mounika said...: Really Nice post..

best training institute for hadoop in BTM

best big data hadoop training in BTM

hadoop training in btm

hadoop training institutes in btm

hadoop course in btm; November 28, 2018 at 3:58 AM
Sherin infanta said...: Good content stuff. I got more knowledge from this blog.
Keep it up..................
Thanks for sharing............ Duplicate Payment
Continuous Monitoring
Duplicate Payment Audit
AP Vendor Helpdesk; December 1, 2018 at 8:34 AM
Unknown said...: It's Really A Great Post. Looking For Some More Stuff.

shriram break free; February 3, 2019 at 7:20 AM
TNK Design Desk said...: This is an amazing blog, thank you so much for sharing such valuable information with us.
Visit for best website design and SEO services at- Website Development Company in India
web company in delhi
web desiging company
web design & development banner
web design & development company
web design & development services
web design agency delhi
web design agency in delhi
web design and development services
web design companies in delhi
web design company delhi
web design company in delhi
web design company in gurgaon
web design company in noida
web design company list
web design company services
web design company website
web design delhi
web design development company
web design development services
web design in delhi
web design service
web design services company
web design services in delhi
web designer company
web designer delhi
web designer in delhi
web designers delhi
web designers in delhi
web designing & development
web designing advertisement
web designing and development
web designing and development company
web designing and development services
web designing companies in delhi
web designing company delhi
web designing company in delhi
web designing company in gurgaon
web designing company in new delhi
web designing company in noida
web designing company logo; December 11, 2019 at 6:02 AM
Julia Loi said...: Really appreciate this wonderful post that you have provided for us.Great site and a great topic as well I really get amazed to read this. It's really good.
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!.
mobile phone repair in Fredericksburg
iphone repair in Fredericksburg
cell phone repair in Fredericksburg
phone repair in Fredericksburg
tablet repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair Fredericksburg
iphone repair Fredericksburg
cell phone repair Fredericksburg
phone repair Fredericksburg; March 3, 2020 at 6:50 AM
Anonymous said...: python training in bangalore | python online training
aws training in bangalore | aws online training
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training; June 30, 2020 at 3:30 AM
Print Custom Boxes said...: What is Hadoop – Get to know about its definition & meaning, Hadoop architecture & its components, ... However, Apache Hadoop was the first one which reflected this wave of innovation.

Custom Cosmetic Boxes; September 29, 2020 at 11:06 AM
k2incenseonlineheadshop.com said...: https://k2incenseonlineheadshop.com/
info@k2incenseonlineheadshop.com
k2incenseonlineheadshop Buy liquid incense cheap Buy liquid incense cheap For Sale At The Best Incense Online Shop; January 25, 2021 at 6:28 AM
Anonymous said...: Positive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking back soon to find out what additional posts you include.

Peter Black

Peter Black

imran

abid

fahim; September 8, 2022 at 8:25 AM
Airlines Point said...: IndiGO Airlines Vietnam Office; July 18, 2023 at 7:44 AM
Shravan Bhan said...: This comment has been removed by the author.; September 23, 2025 at 7:42 AM
Anonymous said...: Great insights shared in this post! I’d like to add that choosing the right partner for PPC Services in Delhi can make a huge impact. Agencies like Xtremeads focus on targeted campaigns, better ad optimization, and ROI-driven strategies that help businesses generate quality leads and consistent growth.; September 23, 2025 at 7:45 AM
TruckNCR said...: Great article! For small-scale goods transport and local shifting, Tata Ace for Rent is one of the most reliable options. At TruckNCR, we provide Tata Ace (Chota Hathi) services for household items, office goods, and shop deliveries across Delhi NCR with affordable rates and easy online booking.; September 24, 2025 at 6:14 AM

the Hadooper in me

Blog Archive

Let's understand Hadoop at first look

24 comentarios:

Post a Comment