How to compile unmanaged libraries within a SCALA application

How to compile unmanaged libraries within a SCALA application

This post may contain affiliate links. Please read our disclosure for more info.

How to compile unmanaged libraries within a SCALA application (blog)In this post, I will share how to compile & package old or unmanaged libraries with your scala application. Recently I was building a new Scala application using an older thrift library and SBT compiler gave me very hard time in getting it to work. There are various hacks to go around this problem but I wanted to stick with SBT compiler’s base flows as much as possible and leverage its core functionality to its fullest. Keeping reading to learn how to compile external libraries within a scala application and my experience…

Lets start with a quick recap of what SBT is? Basically SBT is a dependency manager just like Maven which is quite easy to use and is gaining a lot of attention within Scala & Spark worlds.

If you haven’t installed SBT on your PC yet, then checkout my other post on step by step guide about installing Scala, SBT and Scala.

Now lets talk a little bit of what happens behind the scenes when SBT compiler is invoked. When you type sbt compile on the command prompt, it looks for a file called build.sbt (you can name it anything.sbt also) in the project’s root directory. In this file, which is equivalent to Maven’s pom.xml, you will specify Scala version, name of your application (used to name finally packaged jar), and some dependencies needed by your Scala application. Here’s a sample build.sbt file:

name := "My Scala Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
 
libraryDependencies ++= Seq(
    "org.apache.hadoop" % "hadoop-core" % "0.20.2",
    "org.apache.hbase" % "hbase" % "0.90.4"
   
)
 
scalacOptions += "-target:jvm-1.7"

Once the build.sbt file is located, SBT recursively downloads all dependencies including the internal ones needed by libraries themselves.

On a quick side note, you might be interested in learning why having a high performance spark application is a way to go.

For instance, in this sample build.sbt file, I want SBT to compile my Scala application using spark-core version 1.6.0, spark-hive version 1.6.0, spark-sql version 2.1.0 or higher pretty much similar to what we do in any maven application. In addition, I also want two more dependencies; hadoop-core 0.20.2 and hbase-0.90.4 which I declared together within Seq(). There’s no difference between dependencies declared outside Seq() or within, they are just two different ways to declare same thing.

You might also like:   Spark Streaming with Kafka

Alright, lets now focus on HBase dependency:

"org.apache.hbase" % "hbase" % "0.90.4"

that’s where I faced most of the problems. Hbase-0.90.4 internally uses thrift library but since hadoop-core is set to version 0.20.2, by default SBT will try to download org.apache.thrift#thrift;0.2.0. Ironically, thrift-0.2.0 wasn’t easily available on org.apache or mvnrepository.com for SBT to download. I had to search it a little bit and finally I found it under RedHat GA repository at

https://maven.repository.redhat.com/ga/org/apache/thrift/thrift/0.2.0/

Anyways, the downloaded jars and libraries are stored by default at

/home/user/.ivy2/cache/

Thats where all downloaded dependencies or external libraries will be cached and be available for any subsequent builds in future. This is to avoid downloading same dependencies every time user compiles the application.

If you want to force a local jars or libraries, then here’s how you can do it:

  1. Create a directory as
/home/user/.ivy2/local

2. Under this directory create subsequent directories for library(s) that you want to download. I will take the case of thrift-0.2.0. So, I added following directory structure:

/home/user/.ivy2/local/org.apache.thrift/thrift/0.2.0/ivys

 

Quick side note, here is a list of related posts that I recommend:

 

Now there are 2 ways to get your desired thrift.jar compiled with SBT.

First method

Create an ivy.xml file and let it download all dependencies in the usual way.

For this, you can create a file called ‘ivy.xml’ under

/home/user/.ivy2/local/org.apache.thrift/thrift/0.2.0/ivys

Note to keep name of this file exactly as mentioned. SBT specifically looks for ivy.xml file at this path.

Then, add following properties inside ivy.xml file. ( you might need to change it according to the libraries you need )

<?xml version="1.0" encoding="UTF-8"?>
<ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven" xmlns:e="http://ant.apache.org/ivy/extra">
        <info organisation="org.apache.thrift"
                module="thrift"
                revision="0.2.0"
                status="release"
                publication="20141118121911"
        >
                <license name="The Apache Software License, Version 2.0" url="http://www.apache.org/licenses/LICENSE-2.0.txt" />
                <description homepage="http://thrift.apache.org">
                Thrift is a software framework for scalable cross-language services development.
                </description>
                <e:sbtTransformHash>e0c5ee03acc03200c1cebed43f387c2bf613e676</e:sbtTransformHash>
        </info>
        <configurations>
                <conf name="default" visibility="public" description="runtime dependencies and master artifact can be used with this conf" extends="runtime,master"/>
                <conf name="master" visibility="public" description="contains only the artifact published by this module itself, with no transitive dependencies"/>
                <conf name="compile" visibility="public" description="this is the default scope, used if none is specified. Compile dependencies are available in all classpaths."/>
                <conf name="provided" visibility="public" description="this is much like compile, but indicates you expect the JDK or a container to provide it. It is only available on the compilation classpath, and is not transitive."/>
                <conf name="runtime" visibility="public" description="this scope indicates that the dependency is not required for compilation, but is for execution. It is in the runtime and test classpaths, but not the compile classpath." extends="compile"/>
                <conf name="test" visibility="private" description="this scope indicates that the dependency is not required for normal use of the application, and is only available for the test compilation and execution phases." extends="runtime"/>
                <conf name="system" visibility="public" description="this scope is similar to provided except that you have to provide the JAR which contains it explicitly. The artifact is always available and is not looked up in a repository."/>
                <conf name="sources" visibility="public" description="this configuration contains the source artifact of this module, if any."/>
                <conf name="javadoc" visibility="public" description="this configuration contains the javadoc artifact of this module, if any."/>
                <conf name="optional" visibility="public" description="contains all optional dependencies"/>
        </configurations>
 
        <publications>
                <artifact name="thrift" type="jar" ext="jar" conf="master" />
        </publications>
        <dependencies>
                <!-- https://mvnrepository.com/artifact/org.apache.thrift/thrift -->
                <!--<dependency org="org.apache.thrift" name="thrift" rev="0.2.0"/>
                -->
          </dependencies>
</ivy-module>

This should get you thrift.jar as well as any other dependent libraries internally used by thrift or as specified under <dependencies> tag above.

You might also like:   Unlocking Big Data: Exploring the Power of Apache Spark for Distributed Computing

You might want to check this one skill that can boost your confidence and make you excel at job in no time.

But hold on, there’s an easier way too, which I learned the hard way !!!

Second Method

Here you go, after step#2 above create another directory called ‘jars’. So your directory structure would look like this;

/home/user/.ivy2/local/org.apache.thrift/thrift/0.2.0/jars

Then download the required version of jar file into this directory. If you are on Linux or MAC, you can use wget, for windows you can manually download it from browser:

wget https://maven.repository.redhat.com/ga/org/apache/thrift/thrift/0.2.0/thrift-0.2.0.jar

Provide appropriate permissions

chmod 754 thrift-0.2.0.jar

Working behind a firewall?

If your server is behind a firewall and couldn’t connect directly to external maven repositories, then you can download the jar file on your PC/Mac manually and just upload it to above path on your server, or create a bash shell script with proxy settings as shown here:

#!/bin/bash
#name: sbtcompiler.sh

export http_proxy=<proxy url>
export https_proxy=${http_proxy}
export ftp_proxy=${http_proxy}
export rsync_proxy=${http_proxy}
 
echo "compiling..."
sbt clean compile
echo "completed sbt compile"
 
echo "building a package..."
sbt package
echo "completed sbt package"

Finally, compile and package your project as:

TOP PAYING JOBS REQUIRE THIS SKILL

ENROLL AT 90% OFF TODAY

Complete ElasticSearch Integration with LogStash, Hadoop, Hive, Pig, Kibana and MapReduce - DataSharkAcademy

sbt clean compile package
OR
./sbtcompiler.sh

Above commands consists of 3 individual SBT commands;

  1. clean – to delete any previous .class or intermediate files
  2. compile – obviously to compile .scala into .class files
  3. package – to create a jar file

Finally you should get a jar package created as target/scala-xx.x/<project-name>.jar

Data Analysis with Spark Using Python - DataShark Academy

If you are interested in reading more about Scala and SBT and wants to set trio on a local PC, then I think you will like my other article on setting up Spark, Scala and SBT on a Windows PC.

You might also like:   Mastering PySpark Window Ranking Functions: A Comprehensive Guide with Code Examples and Performance Profiling

Please share how this article helped you in comments below.


[jetpack-related-posts]

2 Comments

  1. […] If your Scala application requires some external libraries then I warn you that as of SBT 0.13, it is not easy to compile them in single scala application but it is possible. You can find out how to compile external libraries in a scala application here. […]

  2. […] For advanced users, there are many more cool features such as Remote Desktop for windows and Unix both. Yes you read it right, for Both. Then there’s a textEditor inbuilt if you like to make notes or even write programs which I don’t use much as I prefer proper IDEs for development. You can check how I used MobaXterm to fix a critical problem at work. […]

Leave a Reply

Scroll to top