Làm cách nào để phân tích cú pháp chuỗi HTML trong Java?

Cho chuỗi "<table><tr><td>Hello World!</td></tr></table>", cách nào (dễ nhất) để có được một phần tử DOM đại diện cho nó?Làm cách nào để phân tích cú pháp chuỗi HTML trong Java?

Nguồn

2009-09-30 IttayD

Tôi tìm thấy nơi này (không nhớ nơi nào):

public static DocumentFragment parseXml(Document doc, String fragment) 
{ 
    // Wrap the fragment in an arbitrary element. 
    fragment = "<fragment>"+fragment+"</fragment>"; 
    try 
    { 
     // Create a DOM builder and parse the fragment. 
     DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
     Document d = factory.newDocumentBuilder().parse(
       new InputSource(new StringReader(fragment))); 

     // Import the nodes of the new document into doc so that they 
     // will be compatible with doc. 
     Node node = doc.importNode(d.getDocumentElement(), true); 

     // Create the document fragment node to hold the new nodes. 
     DocumentFragment docfrag = doc.createDocumentFragment(); 

     // Move the nodes into the fragment. 
     while (node.hasChildNodes()) 
     { 
      docfrag.appendChild(node.removeChild(node.getFirstChild())); 
     } 
     // Return the fragment. 
     return docfrag; 
    } 
    catch (SAXException e) 
    { 
     // A parsing error occurred; the XML input is not valid. 
    } 
    catch (ParserConfigurationException e) 
    { 
    } 
    catch (IOException e) 
    { 
    } 
    return null; 
}

Nguồn

2009-10-02 12:28:47 IttayD

You could use Swing:

Làm thế nào để bạn tận dụng các khả năng HTML chế biến được xây dựng vào Java? Bạn có thể không biết rằng Swing chứa tất cả các lớp cần thiết để phân tích cú pháp HTML. Jeff Heaton chỉ cho bạn cách thực hiện.

Nguồn

2009-09-30 13:02:50

bạn có thể sử dụng Trình phân tích cú pháp HTML, thư viện Java được sử dụng để phân tích cú pháp HTML theo kiểu tuyến tính hoặc lồng nhau. Nó là một công cụ mã nguồn mở và có thể được tìm thấy trên SourceForge

Nguồn

2009-09-30 13:03:13 nkr1pt

Tôi đã sử dụng Jericho HTML Parser nó OSS, phát hiện (tha thứ) đúng định dạng thẻ và là nhẹ

Nguồn

2009-09-30 13:10:07

Dưới đây là một cách:

import java.io.*; 
import javax.swing.text.*; 
import javax.swing.text.html.*; 
import javax.swing.text.html.parser.*; 

public class HtmlParseDemo { 
    public static void main(String [] args) throws Exception { 
     Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>"); 
     HTMLEditorKit.Parser parser = new ParserDelegator(); 
     parser.parse(reader, new HTMLTableParser(), true); 
     reader.close(); 
    } 
} 

class HTMLTableParser extends HTMLEditorKit.ParserCallback { 

    private boolean encounteredATableRow = false; 

    public void handleText(char[] data, int pos) { 
     if(encounteredATableRow) System.out.println(new String(data)); 
    } 

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) { 
     if(t == HTML.Tag.TR) encounteredATableRow = true; 
    } 

    public void handleEndTag(HTML.Tag t, int pos) { 
     if(t == HTML.Tag.TR) encounteredATableRow = false; 
    } 
}

Nguồn

2009-09-30 13:10:58

Điều gì sẽ xảy ra nếu tôi muốn đặt tất cả các phần dữ liệu vào một mảng trong lớp ngoài, thay vì in chúng ra? – CodyBugstein

@Imray, hãy tiếp tục, bạn có sự cho phép của tôi để đưa chúng vào một số loại bộ sưu tập thay vì in chúng :) –

Tôi đặt chúng trong một bộ sưu tập bên trong lớp 'HTMLTableParser', và sau đó tạo ra một phương thức getter để lấy chúng. Đó có phải là cách tốt nhất để làm điều đó không? – CodyBugstein

Nếu bạn có chuỗi chứa HTML bạn có thể sử dụng Jsoup thư viện như thế này để nhận các phần tử HTML:

String htmlTable= "<table><tr><td>Hello World!</td></tr></table>"; 
Document doc = Jsoup.parse(htmlTable); 

// then use something like this to get your element: 
Elements tds = doc.getElementsByTag("td"); 

// tds will contain this one element: <td>Hello World!</td>

Chúc may mắn!

Nguồn

2015-04-08 19:39:11 zygimantus

Thư viện này chỉ cần thực hiện công việc, cảm ơn! – negstek

Làm cách nào để phân tích cú pháp chuỗi HTML trong Java?

Trả lời

Các vấn đề liên quan