RDD到DataFrame的转化方法
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val fields = Array(StructField("id",IntegerType,true),StructField("name",StringType,true),StructField("age",IntegerType,true))
val schema = StructType(fields)
val peopleRDD = spark.sparkContext.textFile("file:///opt/module/spark/employee.txt")
val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0).trim.toInt,attributes(1),attributes(2).trim.toInt))
val peopleDF = spark.createDataFrame(rowRDD,schema)
val employee = spark.sql("select id,name,age from people")
employee.map(t => "id:"+t(0)+","+"name:"+t(1)+","+"age:"+t(2)).show()
将employee.txt实现从RDD转换得到DataFrame,并按id:1,name:Ella,age:36的格式打印出
但结果输出却是这样的
prompt:
转换代码:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
spark = SparkSession.builder.appName("RDD to DataFrame").getOrCreate()
# parsing the data to create RDD
dataRDD = spark.sparkContext.textFile("employee.txt").map(lambda line: line.split(","))
# defining the schema for data
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
# creating dataframe from RDD
df = spark.createDataFrame(dataRDD, schema)
# printing dataframe
df.show()
打印结果:
+---+-------+---+
| id| name|age|
+---+-------+---+
| 1| Alice| 25|
| 2| Bob| 30|
| 3|Charlie| 35|
| 4| David| 40|
+---+-------+---+
问题描述:
我正在使用PySpark尝试将employee.txt
这个文件从RDD转换为DataFrame。我已经定义了数据的schema,但是在输出结果时无法按照id:1,name:Ella,age:36
的格式打印,输出结果却只有列标识符和行数据,缺少了格式说明,我该如何优化代码和问题描述以便让ChatGPT更好地理解我的问题并给出有用的回答?