创建DataFrame

一:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, LongType, StringType, IntegerType

spark = SparkSession.builder
.master(“local”)
.appName(“Create_DataFrame”)
.getOrCreate()

schema = StructType([StructField(“id”, LongType(), True),
StructField(“name”, StringType(), True),
StructField(“age”, IntegerType(), True)])

通过sparkContext中的parallelize方法来创建rdd

rdd = spark.sparkContext.parallelize([
    (1, "alex", 21),
    (2, "marry", 15),
    (3, "jay", 31])

创建DataFrame
df = spark.createDataFrame(rdd, schema)
df.show()

二:
res = [(1, ‘Karol’, 19), (2, ‘Abby’, 20,), (3, ‘Zena’, 18)]
schema = [‘id’, ‘name’, ‘age’]

df = spark.createDataFrame(rdd, schema)
df.show()

res = [(11, ‘alex’, 29), (22, ‘marry’, 15,), (33, ‘jay’, 31)]
schema = [‘id’, ‘name’, ‘age’]

a.     df = pd.DataFrame({'id': (1, 2, 3),'name': ('Karol', 'Abby', 'Zena'), 'Age': (19, 20, 18)})
b.     df = pd.DataFrame(res, schema)
         print(df)

Original: https://blog.csdn.net/qq_42982682/article/details/127277756
Author: gjx_spider
Title: 创建DataFrame

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/751868/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球