Làm cách nào để tách các nút văn bản chỉ khoảng trắng từ một DOM trước khi tuần tự hóa?

Tôi có một số Java (5.0) mã mà xây dựng một DOM từ khác nhau (cache) các nguồn dữ liệu, sau đó loại bỏ nút phần tử nhất định mà không bắt buộc, sau đó serializes kết quả vào một chuỗi XML sử dụng:Làm cách nào để tách các nút văn bản chỉ khoảng trắng từ một DOM trước khi tuần tự hóa?

// Serialize DOM back into a string 
Writer out = new StringWriter(); 
Transformer tf = TransformerFactory.newInstance().newTransformer(); 
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
tf.setOutputProperty(OutputKeys.INDENT, "no"); 
tf.transform(new DOMSource(doc), new StreamResult(out)); 
return out.toString();

Tuy nhiên , vì tôi đang xóa một vài nút phần tử, tôi kết thúc với rất nhiều khoảng trắng thừa trong tài liệu tuần tự cuối cùng.

Có cách nào đơn giản để xóa/thu gọn khoảng trắng không liên quan khỏi DOM trước (hoặc trong khi) nó được tuần tự hóa thành chuỗi không?

Nguồn

2009-06-11 Marc Novakowski

Bạn có thể tìm thấy các nút văn bản trống sử dụng XPath, sau đó loại bỏ chúng theo chương trình như sau:

XPathFactory xpathFactory = XPathFactory.newInstance(); 
// XPath to find empty text nodes. 
XPathExpression xpathExp = xpathFactory.newXPath().compile(
     "//text()[normalize-space(.) = '']"); 
NodeList emptyTextNodes = (NodeList) 
     xpathExp.evaluate(doc, XPathConstants.NODESET); 

// Remove each empty text node from document. 
for (int i = 0; i < emptyTextNodes.getLength(); i++) { 
    Node emptyTextNode = emptyTextNodes.item(i); 
    emptyTextNode.getParentNode().removeChild(emptyTextNode); 
}

Cách tiếp cận này có thể hữu ích nếu bạn muốn kiểm soát nhiều hơn đối với xóa nút hơn là dễ dàng đạt được với một biểu mẫu XSL te.

Nguồn

2009-06-11 06:18:41

Tôi thích giải pháp "mã duy nhất" này thậm chí còn tốt hơn so với giải pháp XSL, và giống như bạn đã nói có nhiều quyền kiểm soát hơn đối với việc loại bỏ nút, nếu cần. –

Nhân tiện, phương pháp này dường như chỉ hoạt động nếu trước tiên tôi gọi doc.normalize() trước khi thực hiện xóa nút. Tôi không chắc tại sao lại tạo nên sự khác biệt. –

Câu trả lời hay. Làm việc cho tôi ngay cả khi không bình thường hóa(). –

Hãy thử sử dụng XSL sau đây và các yếu tố strip-space serialize DOM của bạn:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

    <xsl:output method="xml" omit-xml-declaration="yes"/> 

    <xsl:strip-space elements="*"/> 

    <xsl:template match="@*|node()"> 
    <xsl:copy> 
    <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
    </xsl:template> 

</xsl:stylesheet>

http://helpdesk.objects.com.au/java/how-do-i-remove-whitespace-from-an-xml-document

Nguồn

2009-06-11 00:58:33 objects

Cảm ơn! Đó là một câu trả lời hay và tôi đã thử nó .. và nó hoạt động. –

-3

transformer.setOutputProperty(OutputKeys.INDENT, "yes");

Điều này sẽ giữ lại xent thụt lề.

Nguồn

2011-01-05 08:10:17

Nó không loại bỏ không gian thừa. –

Dưới đây mã sẽ xóa các nút nhận xét và các nút văn bản có tất cả các khoảng trống. Nếu nút văn bản có một số giá trị, giá trị sẽ được cắt

public static void clean(Node node) 
{ 
    NodeList childNodes = node.getChildNodes(); 

    for (int n = childNodes.getLength() - 1; n >= 0; n--) 
    { 
    Node child = childNodes.item(n); 
    short nodeType = child.getNodeType(); 

    if (nodeType == Node.ELEMENT_NODE) 
     clean(child); 
    else if (nodeType == Node.TEXT_NODE) 
    { 
     String trimmedNodeVal = child.getNodeValue().trim(); 
     if (trimmedNodeVal.length() == 0) 
      node.removeChild(child); 
     else 
      child.setNodeValue(trimmedNodeVal); 
    } 
    else if (nodeType == Node.COMMENT_NODE) 
     node.removeChild(child); 
    } 
}

Ref: http://www.sitepoint.com/removing-useless-nodes-from-the-dom/

Nguồn

2013-04-29 18:27:34

Một cách tiếp cận có thể là để loại bỏ khoảng trắng láng giềng cùng một lúc như bạn đang loại bỏ các nút mục tiêu:

private void removeNodeAndTrailingWhitespace(Node node) { 
    List<Node> exiles = new ArrayList<Node>(); 

    exiles.add(node); 
    for (Node whitespace = node.getNextSibling(); 
      whitespace != null && whitespace.getNodeType() == Node.TEXT_NODE && whitespace.getTextContent().matches("\\s*"); 
      whitespace = whitespace.getNextSibling()) { 
     exiles.add(whitespace); 
    } 

    for (Node exile: exiles) { 
     exile.getParentNode().removeChild(exile); 
    } 
}

Điều này có lợi cho việc giữ nguyên phần còn lại của định dạng hiện có.

Nguồn

2015-01-23 18:34:37 pimlottc

Các mã sau hoạt động:

public String getSoapXmlFormatted(String pXml) { 
    try { 
     if (pXml != null) { 
      DocumentBuilderFactory tDbFactory = DocumentBuilderFactory 
        .newInstance(); 
      DocumentBuilder tDBuilder; 
      tDBuilder = tDbFactory.newDocumentBuilder(); 
      Document tDoc = tDBuilder.parse(new InputSource(
        new StringReader(pXml))); 
      removeWhitespaces(tDoc); 
      final DOMImplementationRegistry tRegistry = DOMImplementationRegistry 
        .newInstance(); 
      final DOMImplementationLS tImpl = (DOMImplementationLS) tRegistry 
        .getDOMImplementation("LS"); 
      final LSSerializer tWriter = tImpl.createLSSerializer(); 
      tWriter.getDomConfig().setParameter("format-pretty-print", 
        Boolean.FALSE); 
      tWriter.getDomConfig().setParameter(
        "element-content-whitespace", Boolean.TRUE); 
      pXml = tWriter.writeToString(tDoc); 
     } 
    } catch (RuntimeException | ParserConfigurationException | SAXException 
      | IOException | ClassNotFoundException | InstantiationException 
      | IllegalAccessException tE) { 
     tE.printStackTrace(); 
    } 
    return pXml; 
} 

public void removeWhitespaces(Node pRootNode) { 
    if (pRootNode != null) { 
     NodeList tList = pRootNode.getChildNodes(); 
     if (tList != null && tList.getLength() > 0) { 
      ArrayList<Node> tRemoveNodeList = new ArrayList<Node>(); 
      for (int i = 0; i < tList.getLength(); i++) { 
       Node tChildNode = tList.item(i); 
       if (tChildNode.getNodeType() == Node.TEXT_NODE) { 
        if (tChildNode.getTextContent() == null 
          || "".equals(tChildNode.getTextContent().trim())) 
         tRemoveNodeList.add(tChildNode); 
       } else 
        removeWhitespaces(tChildNode); 
      } 
      for (Node tRemoveNode : tRemoveNodeList) { 
       pRootNode.removeChild(tRemoveNode); 
      } 
     } 
    } 
}

Nguồn

2016-07-20 17:54:03 user6615071

Câu trả lời này sẽ được hưởng lợi từ một số giải thích. – Eiko

Làm cách nào để tách các nút văn bản chỉ khoảng trắng từ một DOM trước khi tuần tự hóa?

Trả lời

Các vấn đề liên quan