Quyết định thực hiện vấn đề cây trong apache spark với java

Tôi đang cố gắng triển khai demo đơn giản cho phân loại cây quyết định bằng cách sử dụng phiên bản 1.0.0 và apache spark 1.0.0. Tôi căn cứ vào số http://spark.apache.org/docs/1.0.0/mllib-decision-tree.html. Cho đến nay tôi đã viết mã được liệt kê dưới đây.Quyết định thực hiện vấn đề cây trong apache spark với java

phù hợp với đoạn mã sau tôi nhận được lỗi:

org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy();

Loại không phù hợp: không thể chuyển đổi từ Entropy để tạp chất. Đó là kỳ lạ đối với tôi, trong khi lớp Entropy thực hiện giao diện tạp chất:

https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/mllib/tree/impurity/Entropy.html

Tôi đang tìm câu trả lời cho câu hỏi tại sao tôi không thể làm việc này?

package decisionTree; 

import java.util.regex.Pattern; 

import org.apache.spark.api.java.JavaRDD; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.api.java.function.Function; 
import org.apache.spark.mllib.linalg.Vectors; 
import org.apache.spark.mllib.regression.LabeledPoint; 
import org.apache.spark.mllib.tree.DecisionTree; 
import org.apache.spark.mllib.tree.configuration.Algo; 
import org.apache.spark.mllib.tree.configuration.Strategy; 
import org.apache.spark.mllib.tree.impurity.Gini; 
import org.apache.spark.mllib.tree.impurity.Impurity; 

import scala.Enumeration.Value; 

public final class DecisionTreeDemo { 

    static class ParsePoint implements Function<String, LabeledPoint> { 
     private static final Pattern COMMA = Pattern.compile(","); 
     private static final Pattern SPACE = Pattern.compile(" "); 

     @Override 
     public LabeledPoint call(String line) { 
      String[] parts = COMMA.split(line); 
      double y = Double.parseDouble(parts[0]); 
      String[] tok = SPACE.split(parts[1]); 
      double[] x = new double[tok.length]; 
      for (int i = 0; i < tok.length; ++i) { 
       x[i] = Double.parseDouble(tok[i]); 
      } 
      return new LabeledPoint(y, Vectors.dense(x)); 
     } 
    } 

    public static void main(String[] args) throws Exception { 

     if (args.length < 1) { 
      System.err.println("Usage:DecisionTreeDemo <file>"); 
      System.exit(1); 
     } 

     JavaSparkContext ctx = new JavaSparkContext("local[4]", "Log Analizer", 
       System.getenv("SPARK_HOME"), 
       JavaSparkContext.jarOfClass(DecisionTreeDemo.class)); 

     JavaRDD<String> lines = ctx.textFile(args[0]); 
     JavaRDD<LabeledPoint> points = lines.map(new ParsePoint()).cache(); 

     int iterations = 100; 

     int maxBins = 2; 
     int maxMemory = 512; 
     int maxDepth = 1; 

     org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy(); 

     Strategy strategy = new Strategy(Algo.Classification(), impurity, maxDepth, 
       maxBins, null, null, maxMemory); 

     ctx.stop(); 
    } 
}

@samthebest nếu tôi loại bỏ tạp chất biến và thay đổi hình thức sau đây:

Strategy strategy = new Strategy(Algo.Classification(), new org.apache.spark.mllib.tree.impurity.Entropy(), maxDepth, maxBins, null, null, maxMemory);

lỗi đổi thành: Các constructor Entropy() là không xác định.

[sửa] tôi thấy tôi nghĩ gọi đúng đắn của phương pháp (https://issues.apache.org/jira/browse/SPARK-2197):

Strategy strategy = new Strategy(Algo.Classification(), new Impurity() { 
@Override 
public double calculate(double arg0, double arg1, double arg2) 
{ return Gini.calculate(arg0, arg1, arg2); } 

@Override 
public double calculate(double arg0, double arg1) 
{ return Gini.calculate(arg0, arg1); } 

}, 5, 100, QuantileStrategy.Sort(), null, 256);

Đáng tiếc là tôi chạy vào lỗi :(

Nguồn

2014-06-28 caruso

Tỷ lệ cược. Hãy thử chỉ nội tuyến thay vì gán cho một biến. Sau khi tất cả các bạn chỉ sử dụng biến một lần. Cũng thực sự khuyên bạn nên sử dụng Scala thay vì Java API, bạn có thể làm điều đó toàn bộ theo nghĩa đen là một vài dòng và nó sẽ dễ đọc hơn nhiều. – samthebest

giải pháp

Một Java cho Bug 2197 có sẵn bây giờ, qua this pull request :

Other improvements to Decision Trees for easy-of-use with Java: * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor --> Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently. I suspect we will redo the API before the other options are included.

Bạn có thể xem ví dụ đầy đủ, đang đi trước vấn đề của bạn bằng phương thức intance() của Tạp chất Gini here

Strategy strategy = new Strategy(Algo.Classification(), Gini.instance(), maxDepth, numClasses,maxBins, categoricalFeaturesInfo); 
DecisionTreeModel model = DecisionTree$.MODULE$.train(rdd.rdd(), strategy);

Nguồn

2014-08-14 12:17:56 emecas

Quyết định thực hiện vấn đề cây trong apache spark với java

Trả lời

Các vấn đề liên quan