Hadoop - DataShark Academy

Unlocking Big Data: Exploring the Power of Apache Spark for Distributed Computing

This post may contain affiliate links. Please read our disclosure for more info.

Apache spark is the fastest distributed computing engine in the world today. It provides excellent set of libraries to help you handle any volume...

Mastering Apache Sqoop with Hortonworks Sandbox, Hadoop, Hive & MySQL - DataShark.Academy

Master Apache SQOOP with Big Data Hadoop

This post may contain affiliate links. Please read our disclosure for more info.

In this apache sqoop tutorial, you will learn everything that you need to know about Apache Sqoop and how to integrate it within Big...

Everything you need to know about Hadoop Shell

This post may contain affiliate links. Please read our disclosure for more info.

Hadoop Shell is a Linux like terminal utility that can be used to interact with Hadoop’s distributed file system. For Linux users it will...

How to setup Apache Hadoop Cluster on a Mac or Linux Computer

This post may contain affiliate links. Please read our disclosure for more info.

If you have checked our post on How to Quickly Setup Apache Hadoop on Windows PC, then you will find in this post that its...

How to avoid small files problem in Hadoop

This post may contain affiliate links. Please read our disclosure for more info.

Are you looking to avoid small files problem in Hadoop? Read below to learn exactly where to look for and how to avoid small...

Why Large number of files on Hadoop is a problem and how to fix it?

This post may contain affiliate links. Please read our disclosure for more info.

There are multiple reasons for large number of files on Hadoop. Hadoop has its own file system to store data in form of files;...

The Best Data Processing Architectures: Lambda vs Kappa

This post may contain affiliate links. Please read our disclosure for more info.

In big data world, things are changing too quickly to catch and so is the size of data that an application should handle. If...

How to Quickly Setup Apache Hadoop on Windows PC

This post may contain affiliate links. Please read our disclosure for more info.

Hadoop is an open source distributed storage and processing software framework sponsored by Apache Software Foundation. It’s core technology is based on Java as...

6 Reasons Why Hadoop is The Best Choice for Big Data Application (Home)

6 Reasons Why Hadoop is THE Best Choice for Big Data Applications

This post may contain affiliate links. Please read our disclosure for more info.

Often people ask us about what is big data? what is Hadoop? Where did it come from? and why it’s such a hot topic...

Complete ElasticSearch Integration with LogStash, Hadoop, Hive, Pig, Kibana and MapReduce - DataSharkAcademy

ELK Stack (ElasticSearch – LogStash – Kibana) including hands-on practicals with Apache Hadoop, Hive, PIG & MapReduce

This post may contain affiliate links. Please read our disclosure for more info.

Complete ElasticSearch tutorial for beginners to advanced level professionals. Learn how to use ElasticSearch with Apache Hadoop and build various real world big data...

How to find bad partitions in hive table

How to find bad partitions in a huge HIVE table

This post may contain affiliate links. Please read our disclosure for more info.

Recently we found an issue with use of ANALYZE table queries inside Hive, where analyze command was changing ‘LOCATION’ property of random partitions in...