Tuesday, November 2, 2010
For the newcomers, I'll provide here a simple approach to understand Hadoop's design and main goals.

Compare: one of the first mistaken attitudes !
A typical transaction system lives inside the range of some gigabytes data, in random access. It runs inside an application server, acceses and takes control over a relational database allocated in a different server, transporting data in and out of it.
Interacts with online clients, keeping a reduced size of data in transport, with a shared and limited bandwith, the operations are most of all continuous reading over small sets of data, combined with some maintenance & CRUD operations.
Bigger data processing is done in batch, but the architecture is the same.

But a different scenario ?
What happens if we need to process 1 petabyte/week ?, let's say a much lesser size, 1 terabyte/day, less, 100 gigabytes/day
Under the traditional schema, it would be like moving an elephant over a string thousands of times, the time-windows would not allow to work with realtime information, but always trying to keep up and losing the race.

Want problems ?
The network would saturate, batch processes would take days to finish information of hours, the harddisk latency access it will transform into an incremental overhead, having a painful impact on overall costs.
The traditional approach would mantain the architecture, but changing hardware, sophisticated requirements, each time more expensive, bigger, but over a limited growth architecture. How many times will you change your system's architecture to non-linear scalar solutions to keep up with the race of growing data, ?

The solution is in the focus !
Problem 1: Size of data
systems handling public data, can have huge processing flows with hundreds of terabytes, public websites have had increased their data up to petabytes !
Strategy 1: Distribute and paralellize the processing
If chewing 200 Tb of data takes 11 days in 1 computer, let divide the whole process into as many machines needed to finish the task in minutes or hours !
If we'll have lot of computers, so be they cheap and easily mantained !, into a simple add & replace node architecture.

Problem 2: Continuous failure and cluster growth
In a big cluster, lot of machines fail everyday, besides the cluster size cannot be fixed and sometimes cannot be planned, it should grow easily at the speed of data.
Strategy 2: High fault tolerance and high scalability
Any design must have an excelent fault-tolerance mechanism, a very flexible maintenance, and its service availability should keep up naturally to its cluster size.
So we need a distributed storage, atomic, transactional, with perfect linear scalability.

Problem 3: Exponential overhead in data transport
From data source to its processing location, bad harddisk latency and saturated networks will end up exploting
Strategy 3: Move application logic to data storage
It must allow to handle data and processes separately and in a transparent way, applications should take care of their own business application code, and the platform should automatically deploy it into the datastorage, execute and handle their results,

So...
  • Distribute and Paralellize processing
  • High fault-tolerance mechanism, High scalability
  • Take application processes to data storage

One Solution: Hadoop Distributed Filesystem + Hadoop MapReduce

Anyway, Hadoop is not bullet-proof, actually, it's a very specific solution for some big-data scenarios, with needs of realtime processing, and technology choices for random access to data. (See HBase, Hive)

Hadoop is not designed to replace RDBMS, but instead, has been proven to handle -in a much performant way- huge amounts of data, whereas traditional enterprise database clusters, wouldn't work not even close at the same overall costs !

21 comentarios:

Unknown said...

Really good piece of knowledge, I had come back to understand regarding your website from my friend Sumit, Hyderabad And it is very useful for who is looking for HADOOP.I browse and saw you website and I found it very interesting.Thank you for the good work, greetings.
Hadoop Training in hyderabad

Stephanie Clark said...

your blog is very helpful for visitors and tourists thanks.

Database design Tanzania

srikanth said...

Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! Hadoop Training

హాట్‌గర్ల్స్ said...

It's Really A Great Post.
Best IT Training in Bangalore

Unknown said...

Hi ,your post on hadoop was simple and as beginner i got an idea what hadoop would be meant in concepts Hadoop Training in Velachery | Hadoop Training .

amar said...

Iot Training in Bangalore
Artificial Intelligence Training in Bangalore
Machine Learning Training in Bangalore
Blockchain Training bangalore
Data Science Training in Bangalore
Big Data and Hadoop Training in bangalore
Devops Training in Bangalore

Technogeekscs said...

I like your all blog so much. You share very useful information for Big Data and Hadoop. Thanks a lot...!

Big Data Training in Pune
Big Data Certification in Pune
Big Data Testing Classes

Unknown said...

Hi admin, I have read your post. It was really awesome post. Keep it up... Customer Reconciliation | CFA Audit | CA Firms

asha said...

Big data and Hadoop post was really very nice
best training institute for hadoop in Bangalore

best big data hadoop training in Bangalroe

hadoop training in bangalore

hadoop training institutes in bangalore

hadoop course in bangalore

Unknown said...

An extremely pleasant guide. I will take after these tips. Much obliged to you for sharing such definite article..... top ca firms in chennai
CFA Audit
Stock Audit

Dharani M said...

More informative and impressive blog
best training institute for hadoop in Marathahalli

best big data hadoop training in Marathahalli

hadoop training in Marathahalli

hadoop training institutes in Marathahalli

hadoop course in Marathahalli

mounika said...

Really Nice post..

best training institute for hadoop in BTM

best big data hadoop training in BTM

hadoop training in btm

hadoop training institutes in btm

hadoop course in btm

Sherin infanta said...

Good content stuff. I got more knowledge from this blog.
Keep it up..................
Thanks for sharing............ Duplicate Payment
Continuous Monitoring
Duplicate Payment Audit
AP Vendor Helpdesk

Unknown said...

It's Really A Great Post. Looking For Some More Stuff.



shriram break free

TNK Design Desk said...

This is an amazing blog, thank you so much for sharing such valuable information with us.
Visit for best website design and SEO services at- Website Development Company in India
web company in delhi
web desiging company
web design & development banner
web design & development company
web design & development services
web design agency delhi
web design agency in delhi
web design and development services
web design companies in delhi
web design company delhi
web design company in delhi
web design company in gurgaon
web design company in noida
web design company list
web design company services
web design company website
web design delhi
web design development company
web design development services
web design in delhi
web design service
web design services company
web design services in delhi
web designer company
web designer delhi
web designer in delhi
web designers delhi
web designers in delhi
web designing & development
web designing advertisement
web designing and development
web designing and development company
web designing and development services
web designing companies in delhi
web designing company delhi
web designing company in delhi
web designing company in gurgaon
web designing company in new delhi
web designing company in noida
web designing company logo

Julia Loi said...


Really appreciate this wonderful post that you have provided for us.Great site and a great topic as well I really get amazed to read this. It's really good.
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!.
mobile phone repair in Fredericksburg
iphone repair in Fredericksburg
cell phone repair in Fredericksburg
phone repair in Fredericksburg
tablet repair in Fredericksburg
mobile phone repair in Fredericksburg
mobile phone repair Fredericksburg
iphone repair Fredericksburg
cell phone repair Fredericksburg
phone repair Fredericksburg


Anonymous said...

python training in bangalore | python online training
aws training in bangalore | aws online training
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training

Print Custom Boxes said...

What is Hadoop – Get to know about its definition & meaning, Hadoop architecture & its components, ... However, Apache Hadoop was the first one which reflected this wave of innovation.

Custom Cosmetic Boxes

k2incenseonlineheadshop.com said...

https://k2incenseonlineheadshop.com/
info@k2incenseonlineheadshop.com
k2incenseonlineheadshop Buy liquid incense cheap Buy liquid incense cheap For Sale At The Best Incense Online Shop

California SEO Agency said...

Positive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking back soon to find out what additional posts you include.

Peter Black

Peter Black

imran

abid

fahim

Airlines Point said...

IndiGO Airlines Vietnam Office