How to use sqoop option files - DataShark.Academy

How to use Sqoop Option File

This post may contain affiliate links. Please read our disclosure for more info.

This solution is part of our Apache Sqoop tutorial. In this exercise, you will learn about how to use sqoop option file to store common configurations at one place. Sqoop option file make it very easy to invoke sqoop commands later on as we will see in this post.

You can read more about Hadoop in our post – 6 Reasons Why Hadoop is THE Best Choice for Big Data Applications.

How to use sqoop option files - DataShark.Academy

Why you must learn about Apache Sqoop?

Apache Sqoop architecture is designed to import data from relational databases such as Oracle, MySQL, etc to Hadoop systems. Hadoop is ideal for batch processing of huge amounts of data. It is industry standard nowadays.

In real world scenarios, using sqoop you can transfer the data from relational tables into Hadoop and then leverage the parallel processing capabilities of Hadoop to process huge amounts of data and generate meaningful data insights. The results of Hadoop processing can again be stored back to relational tables using sqoop export functionality.

As sqoop is one of the integral part of Big data hadoop systems, it is often asked in big data job interviews. This post walks you through a sqoop import example which will help you with those sqoop interview questions.

 

How to install Hadoop

Before proceeding with this exercise, you must have Hadoop installed on your machine.

If you are using a Macbook/Linux computer then refer this step by step guide on installing Hadoop on a Macbook.

In case you are a windows PC user, then refer this guide to install Apache Hadoop on your windows PC.

You might also like:   Simple Sqoop Import using Warehouse Directive

Assuming that you have already installed Apache Hadoop on your computer.

Learn Hadoop- The Definitive Guide - DataShark Academy

Next you need to do 2 more installations before you are ready to run the sqoop import commands:

  1. Install MySQL – For this you can perform mysql download first and then follow instructions to install MySQL on windows or Macbook.
  2. Sqoop Installation – Here as well, first do sqoop download and then follow instructions to install sqoop on windows or Macbook.

The great news is that if you have followed the steps from our guide to install Apache Hadoop on Windows PC, then you don’t need to do either of this aka install MySQL or Sqoop installation. They come pre-built into Hortonworks Data Platform Sandbox package which makes life much easier.

But if you are a Macbook or Linux machine user, then you can use brew utility to install MySQL and Sqoop very easily too. Here are the steps for Mac/Linux users:

Install MySQL

In order to install mysql on a Macbook or Linux machine, just run below command on terminal:

brew install mysql

It will automatically download latest release of MySQL including all dependencies with it and install on your computer.

 

Sqoop Installation

In order to install sqoop on Macbook or Linux machines, just run below command on terminal prompt:

brew install sqoop

This will automatically download sqoop from internet and install it on your machine.

 

Sqoop Option File

After following all installation steps above, you are ready to proceed with sqoop option file exercise. First we will create a text file in local file system and fill in common configuration details in it. Then we will use this file to invoke sqoop import command later.

You might also like:   Sqoop Import Tables Without Primary Keys

Sqoop option file not only helps with sqoop import commands but also with sqoop export process.

Let’s create the sqoop option file now as follows:

:sqoop-option-file.txt

import
--connect
jdbc:mysql://sandbox-hdp.hortonworks.com/sqoop
--username
root
--password
hadoop


How sqoop option files works

As this option file is for sqoop import process, so the first line in the file state the ‘import’ command. Next it has connection directives as follows:

–connect

To connect to database, we used –connect directive followed by JDBC connection string to mysql database.

--connect jdbc:mysql://sandbox-hdp.hortonworks.com/sqoop

For Mac/Linux users, the mysql connection string will look something like:

--connect jdbc:mysql://127.0.0.1:3306/sqoop

–username

This is username for MySQL database

–password

This is password for MySQL database

 

We can also include other directives from sqoop commands such as –table, –target-dir etc if we wish too.

Run Sqoop Import with Option file

Now we have a sqoop option file created. Next let’s see how we can use it in a sqoop import process to copy data from a MySQL table into Hadoop.

BECOME APACHE KAFKA GURU – ZERO TO HERO IN MINUTES

ENROLL TODAY & GET 90% OFF

Apache Kafka Tutorial by DataShark.Academy

sqoop \
--options-file /root/learn/sqoop/sqoop-option-file.txt \
--table employees \
--target-dir /sqoop/employees_ops \
--driver com.mysql.jdbc.Driver


As you see, we don’t need to provide MySQL database connection string anymore. We can simply specify the option file which contains those details.

With sqoop option file feature, we can maintain the connection details at a common place which all developers can access and they won’t need to manually pass in credential information for their commands.

So, we highly recommend to use feature of sqoop option file in production like environments.

You might also like:   Sqoop import specific columns from MySQL

We hope you enjoyed this sqoop lesson on how to run sqoop import command using sqoop option file feature.

 


[jetpack-related-posts]

Leave a Reply

Scroll to top