There are multiple reasons for large number of files on Hadoop. Hadoop has its own file system to store data in form of files;...
Continue readingCategory: Tutorials
Welcome to our tutorials category, your one-stop destination for comprehensive and practical tutorials on various technical topics such as data science, big data, Python, ElasticSearch, AWS, Cloud Systems, and more.
At DataShark Academy, we bring you step-by-step guides, hands-on demonstrations, and in-depth tutorials on various tools, technologies, and concepts to help you gain practical skills and knowledge in the ever-evolving field of technology.
The Best Data Processing Architectures: Lambda vs Kappa
In big data world, things are changing too quickly to catch and so is the size of data that an application should handle. If...
Continue readingHow to Quickly Setup Apache Hadoop on Windows PC
Hadoop is an open source distributed storage and processing software framework sponsored by Apache Software Foundation. It’s core technology is based on Java as...
Continue reading6 Reasons Why Hadoop is THE Best Choice for Big Data Applications
Often people ask us about what is big data? what is Hadoop? Where did it come from? and why it’s such a hot topic...
Continue readingHow to find bad partitions in a huge HIVE table
Recently we found an issue with use of ANALYZE table queries inside Hive, where analyze command was changing ‘LOCATION’ property of random partitions in...
Continue readingHow to compile unmanaged libraries within a SCALA application
In this post, I will share how to compile & package old or unmanaged libraries with your scala application. Recently I was building a...
Continue readingInstalling Spark – Scala – SBT (S3) on Windows PC
As you might already be aware of the new hottest technologies in the world today. Yes, if you thought about Apache Spark, Scala and...
Continue reading