spark run using IDE / Maven

来自:http://stackoverflow.com/questions/26892389/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure-task-from-app

  1. Create a Fat Jar ( One which includes all dependencies ). Use Shade Plugin for this. Example pom :
<plugin>
    <groupid>org.apache.maven.plugins</groupid>
    <artifactid>maven-shade-plugin</artifactid>
    <version>2.2</version>
    <configuration>
        <filters>
            <filter>
                <artifact>*:*</artifact>
                <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                </excludes>
            </filter>
        </filters>
    </configuration>
    <executions>
        <execution>
            <id>job-driver-jar</id>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <shadedartifactattached>true</shadedartifactattached>
                <shadedclassifiername>driver</shadedclassifiername>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
                    <!--
                    Some care is required:
                    http://doc.akka.io/docs/akka/snapshot/general/configuration.html
                    -->
                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>reference.conf</resource>
                    </transformer>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainclass>mainClass</mainclass>
                    </transformer>
                </transformer></transformers>
            </configuration>
        </execution>
        <execution>
            <id>worker-library-jar</id>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <shadedartifactattached>true</shadedartifactattached>
                <shadedclassifiername>worker</shadedclassifiername>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
                </transformer></transformers>
            </configuration>
        </execution>
    </executions>
</plugin>
  1. Now we have to send the compiled jar file to the cluster. For this, specify the jar file in the spark config like this :

SparkConf conf = new SparkConf().setAppName(“appName”).setMaster(“spark://machineName:7077”).setJars(new String[] {“target/appName-1.0-SNAPSHOT-driver.jar”});

  1. Run mvn clean package to create the Jar file. It will be created in your target folder.

  2. Run using your IDE or using maven command :

mvn exec:java -Dexec.mainClass=”className”

This does not require spark-submit. Just remember to package file before running

If you don’t want to hardcode the jar path, you can do this :

  1. In the config, write :

SparkConf conf = new SparkConf() .setAppName(“appName”) .setMaster(“spark://machineName:7077”) .setJars(JavaSparkContext.jarOfClass(this.getClass()));

  1. Create the fat jar ( as above ) and run using maven after running package command :

java -jar target/application-1.0-SNAPSHOT-driver.jar

This will take the jar from the jar the class was loaded.

Original: https://www.cnblogs.com/sunxucool/p/5441277.html
Author: 悟寰轩-叶秋
Title: spark run using IDE / Maven

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8896/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部