Spark学习笔记——读写ScyllaDB

Scylla兼容cassandra API,所以可以使用spark读写cassandra的方法来进行读写

1.查看scyllaDB对应的cassandra版本

bash;gutter:true; cqlsh:my_db> SHOW VERSION [cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]</p> <pre><code> **2.查看spark和cassandra对应的版本** ![Spark学习笔记——读写ScyllaDB](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20220812/517519-20211109220014493-1994204559.png) 参考:https://github.com/datastax/spark-cassandra-connector **3.写scyllaDB** dataset API写scyllaDB ;gutter:true;
ds2.write
.mode("append")
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "my_tb", "keyspace" -> "my_db", "output.consistency.level" -> "ALL", "ttl" -> "8640000"))
.save()

RDD API写scyllaDB

bash;gutter:true; import com.datastax.oss.driver.api.core.ConsistencyLevel import com.datastax.spark.connector._</p> <p>ds.rdd.saveToCassandra("my_db", "my_tb", writeConf = WriteConf(ttl = TTLOption.constant(8640000), consistencyLevel = ConsistencyLevel.ALL))</p> <pre><code> 注意字段的数量和顺序需要和ScyllaDB表的顺序一致,可以使用下面方式select字段 ;gutter:true;
val columns = Seq[String](
"a",
"b",
"c")
val colNames = columns.map(name => col(name))
val colRefs = columns.map(name => toNamedColumnRef(name))

val df2 = df.select(colNames: _*)
df2.rdd
.saveToCassandra(ks, table, SomeColumns(colRefs: _*), writeConf = WriteConf(ttl = TTLOption.constant(8640000), consistencyLevel = ConsistencyLevel.ALL))

不过官方推荐使用DataFrame API,而不是RDD API

If you have the option we recommend using DataFrames instead of RDDs

bash;gutter:true; https://github.com/datastax/spark-cassandra-connector/blob/master/doc/4_mapper.md</p> <pre><code> **4.读scyllaDB** ;gutter:true;
val df = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "words", "keyspace" -> "test" ))
.load()

参考:通过 Spark 创建/插入数据到 Azure Cosmos DB Cassandra API

Cassandra Optimizations for Apache Spark

5.cassandra connector参数

bash;gutter:true;
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

参数调优:Spark + Cassandra, All You Need to Know: Tips and Optimizations

Original: https://www.cnblogs.com/tonglin0325/p/15531196.html
Author: tonglin0325
Title: Spark学习笔记——读写ScyllaDB

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8872/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总