Apache spark is the fastest distributed computing engine in the world today. It provides excellent set of libraries to help you handle any volume...
Continue readingTag: Hadoop
Master Apache SQOOP with Big Data Hadoop
In this apache sqoop tutorial, you will learn everything that you need to know about Apache Sqoop and how to integrate it within Big...
Continue readingEverything you need to know about Hadoop Shell
Hadoop Shell is a Linux like terminal utility that can be used to interact with Hadoop’s distributed file system. For Linux users it will...
Continue readingHow to setup Apache Hadoop Cluster on a Mac or Linux Computer
If you have checked our post on How to Quickly Setup Apache Hadoop on Windows PC, then you will find in this post that its...
Continue readingHow to avoid small files problem in Hadoop
Are you looking to avoid small files problem in Hadoop? Read below to learn exactly where to look for and how to avoid small...
Continue readingWhy Large number of files on Hadoop is a problem and how to fix it?
There are multiple reasons for large number of files on Hadoop. Hadoop has its own file system to store data in form of files;...
Continue readingThe Best Data Processing Architectures: Lambda vs Kappa
In big data world, things are changing too quickly to catch and so is the size of data that an application should handle. If...
Continue readingHow to Quickly Setup Apache Hadoop on Windows PC
Hadoop is an open source distributed storage and processing software framework sponsored by Apache Software Foundation. It’s core technology is based on Java as...
Continue reading6 Reasons Why Hadoop is THE Best Choice for Big Data Applications
Often people ask us about what is big data? what is Hadoop? Where did it come from? and why it’s such a hot topic...
Continue readingELK Stack (ElasticSearch – LogStash – Kibana) including hands-on practicals with Apache Hadoop, Hive, PIG & MapReduce
Complete ElasticSearch tutorial for beginners to advanced level professionals. Learn how to use ElasticSearch with Apache Hadoop and build various real world big data...
Continue readingHow to find bad partitions in a huge HIVE table
Recently we found an issue with use of ANALYZE table queries inside Hive, where analyze command was changing ‘LOCATION’ property of random partitions in...
Continue reading