Làm thế nào để tạo ra n-gram trong scala?

Tôi đang cố gắng viết mã phân tách thuật toán báo chí dựa trên n-gram trong scala. Cách tạo một n-gram cho một tệp lớn: Ví dụ, đối với tệp có chứa "ong là ong của những con ong".Làm thế nào để tạo ra n-gram trong scala?

Trước tiên, phải chọn một n-gram ngẫu nhiên. Ví dụ, con ong.
Sau đó, nó phải tìm n-grams bắt đầu bằng (n-1) từ. Ví dụ, ong của.
nó in từ cuối cùng của n-gram này. Sau đó lặp lại.

Bạn có thể vui lòng cho tôi một số gợi ý cách thực hiện không? Xin lỗi vì sự bất tiện này.

Nguồn

2011-11-24 user1002579

Tôi không biết những gì một n-gram là. Bạn có chọn từ ngẫu nhiên không? Hoặc có một số logic? – santiagobasulto

@santiagobasulto Wikipedia là bạn của bạn: http://en.wikipedia.org/wiki/N-gram –

Đây có phải là do bất kỳ cơ hội nào liên quan đến http://stackoverflow.com/questions/8256830/how-to-make-string hậu quả-trong-scala? –

Câu hỏi của bạn có thể cụ thể hơn một chút nhưng đây là thử của tôi.

val words = "the bee is the bee of the bees" 
words.split(' ').sliding(2).foreach(p => println(p.mkString))

Nguồn

2011-11-24 15:08:46 peri4n

Không phải điều này sẽ chỉ cung cấp cho bạn 2 gram. Nếu n-grams được mong muốn, thì n cần được tham số hóa. – tuxdna

Bạn có thể thử điều này với một tham số của n

val words = "the bee is the bee of the bees" 
val w = words.split(" ") 

val n = 4 
val ngrams = (for(i <- 1 to n) yield w.sliding(i).map(p => p.toList)).flatMap(x => x) 
ngrams foreach println 

List(the) 
List(bee) 
List(is) 
List(the) 
List(bee) 
List(of) 
List(the) 
List(bees) 
List(the, bee) 
List(bee, is) 
List(is, the) 
List(the, bee) 
List(bee, of) 
List(of, the) 
List(the, bees) 
List(the, bee, is) 
List(bee, is, the) 
List(is, the, bee) 
List(the, bee, of) 
List(bee, of, the) 
List(of, the, bees) 
List(the, bee, is, the) 
List(bee, is, the, bee) 
List(is, the, bee, of) 
List(the, bee, of, the) 
List(bee, of, the, bees)

Nguồn

2013-05-24 09:58:58 tuxdna

Đây là một cách tiếp cận dựa trên suối. Điều này sẽ không đòi hỏi quá nhiều bộ nhớ trong khi tính toán n-gram.

object ngramstream extends App { 

    def process(st: Stream[Array[String]])(f: Array[String] => Unit): Stream[Array[String]] = st match { 
    case x #:: xs => { 
     f(x) 
     process(xs)(f) 
    } 
    case _ => Stream[Array[String]]() 
    } 

    def ngrams(n: Int, words: Array[String]) = { 
    // exclude 1-grams 
    (2 to n).map { i => words.sliding(i).toStream } 
     .foldLeft(Stream[Array[String]]()) { 
     (a, b) => a #::: b 
     } 
    } 

    val words = "the bee is the bee of the bees" 
    val n = 4 
    val ngrams2 = ngrams(n, words.split(" ")) 

    process(ngrams2) { x => 
    println(x.toList) 
    } 

}

OUTPUT:

List(the, bee) 
List(bee, is) 
List(is, the) 
List(the, bee) 
List(bee, of) 
List(of, the) 
List(the, bees) 
List(the, bee, is) 
List(bee, is, the) 
List(is, the, bee) 
List(the, bee, of) 
List(bee, of, the) 
List(of, the, bees) 
List(the, bee, is, the) 
List(bee, is, the, bee) 
List(is, the, bee, of) 
List(the, bee, of, the) 
List(bee, of, the, bees)

Nguồn

2013-12-17 12:48:58 tuxdna

Tôi thích nó, không chắc chắn về tính hữu ích của 'process'. Tại sao không chỉ làm 'ngrams (...). Foreach (x => println (x.toList))'? – Mortimer

@Mortimer: Câu hỏi thú vị. 'process' chỉ là một hàm bổ sung. Chúng ta chắc chắn có thể sử dụng 'ngrams2 foreach {x => println (x.toList)}'. Cảm ơn :-) – tuxdna

Làm thế nào để tạo ra n-gram trong scala?

Trả lời

Các vấn đề liên quan