PySpark 报错 java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver

解决方案:

mv mysql-connector-java-8.0.20.jar $SPARK_HOME/jars/

驱动文件mysql-connector-java-8.0.20.jar是从maven仓库下载的:

spark-defaults.conf 中设置如下:

spark.driver.extraClassPath   = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar
spark.executor.extraClassPath = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar

spark.jars = /home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar

测试方法如下:

①pyspark –master yarn(然后在交互是模式中输入交互式代码)

②spark-submit –master yarn –deploy-mode cluster 源码.py

import pandas as pd
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext

def map_extract(element):
    file_path, content = element
    year = file_path[-8:-4]
    return [(year, i) for i in content.split("\n") if i]

spark = SparkSession\
    .builder\
    .appName("PythonTest")\
    .getOrCreate()

res = spark.sparkContext.wholeTextFiles('hdfs://Desktop:9000/user/mercury/names',
                        minPartitions=40)  \
        .map(map_extract) \
        .flatMap(lambda x: x) \
        .map(lambda x: (x[0], int(x[1].split(',')[2]))) \
        .reduceByKey(lambda x,y:x+y)

df = res.toDF(["key","num"])  #把已有数据列改成和目标mysql表的列的名字相同
# print(dir(df))
df.printSchema()
print(df.show())
df.printSchema()

df.write.format("jdbc").options(
    url="jdbc:mysql://Desktop:3306/leaf",
    driver="com.mysql.cj.jdbc.Driver",
    dbtable="spark",
    user="appleyuchi",
    password="appleyuchi").mode('append').save()

Original: https://www.cnblogs.com/RioTian/p/16404769.html
Author: RioTian
Title: PySpark 报错 java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8226/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部