Thanks for Spark, now we can read Mysql table and CSV as dataframes, then join them together conveniently.
The following steps will explain how to read CSV as dataframe in Spark:
2. add them into --jars when start spark-shell:
/opt/bigdata/spark/bin/spark-shell --master spark://master:7077 --jars "/opt/bigdata/spark_extra_libs/spark-csv_2.10-1.3.0.jar,/opt/bigdata/spark_extra_libs/commons-csv-1.2.jar" --driver-memory 2G --executor-memory 6G
3. code reference:
val df = sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") // Use first line of all files as header .option("inferSchema", "true") // Automatically infer data types .load("/tmp/cars.csv")
No comments:
Post a Comment