Apache Spark, thêm cột được tính toán "TRƯỜNG HỢP ... ELSE ..." vào một DataFrame hiện có

Tôi đang cố thêm cột được tính "TRƯỜNG HỢP ... ELSE ..." vào một Khung dữ liệu hiện có, sử dụng API Scala. Bắt đầu dataframe:Apache Spark, thêm cột được tính toán "TRƯỜNG HỢP ... ELSE ..." vào một DataFrame hiện có

color 
Red 
Green 
Blue

dataframe mong muốn (cú pháp SQL: CASE KHI màu == Xanh THEN 1 ELSE 0 END AS bool):

color bool 
Red 0 
Green 1 
Blue 0

Làm thế nào nên tôi thực hiện logic này?

Nguồn

2015-06-11 Leonardo Biagioli

có thể trùng lặp của [SPARK SQL - trường hợp khi sau đó] (https://stackoverflow.com/questions/25157451/spark-sql-case-when-then) –

Trong bản phát hành SPARK 1.4.0 sắp tới (sẽ được phát hành trong vài ngày tới). Bạn có thể sử dụng khi/nếu không thì cú pháp:

// Create the dataframe 
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color") 

// Use when/otherwise syntax 
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

Nếu bạn đang sử dụng SPARK 1.3.0 bạn có thể chọn để sử dụng một UDF:

// Define the UDF 
val isGreen = udf((color: String) => { 
    if (color == "Green") 1 
    else 0 
}) 
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

Nguồn

2015-06-11 15:18:46 Herman

Thank you very much Herman, nó hoạt động! –

Trong Spark 1.5.0: bạn cũng có thể sử dụng chức năng expr cú pháp SQL

val df3 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))

hoặc đồng bằng tia lửa-sql

df.registerTempTable("data") 
val df4 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)

Nguồn

2015-10-28 12:46:07

Điều này cũng hoạt động trong Python –

Tôi thấy điều này:

https://issues.apache.org/jira/browse/SPARK-3813

Làm việc đối với tôi trên spark 2.1.0:

import sqlContext._ 
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i"))) 
rdd.registerTempTable("records") 
println("Result of SELECT *:") 
sql("SELECT case key when '93' then 'ravi' else key end FROM records").collect()

Nguồn

2017-02-25 20:10:44 ozma

tôi đang tìm kiếm rằng thời gian dài vì vậy đây là ví dụ về SPARK 2.1 JAVA với nhóm by- cho người dùng java khác.

import static org.apache.spark.sql.functions.*; 
//... 
    Column uniqTrue = col("uniq").equalTo(true); 
    Column uniqFalse = col("uniq").equalTo(false); 

    Column testModeFalse = col("testMode").equalTo(false); 
    Column testModeTrue = col("testMode").equalTo(true); 

    Dataset<Row> x = basicEventDataset 
      .groupBy(col(group_field)) 
      .agg(
        sum(when((testModeTrue).and(uniqTrue), 1).otherwise(0)).as("tt"), 
        sum(when((testModeFalse).and(uniqTrue), 1).otherwise(0)).as("ft"), 
        sum(when((testModeTrue).and(uniqFalse), 1).otherwise(0)).as("tf"), 
        sum(when((testModeFalse).and(uniqFalse), 1).otherwise(0)).as("ff") 
      );

Nguồn

2017-12-13 14:03:04

Apache Spark, thêm cột được tính toán "TRƯỜNG HỢP ... ELSE ..." vào một DataFrame hiện có

Trả lời

Các vấn đề liên quan