Cụm từ thông dụng để khớp URL trong Java

Tôi sử dụng RegexBuddy khi làm việc với cụm từ thông dụng. Từ thư viện của nó, tôi đã sao chép cụm từ thông dụng để khớp với URL. Tôi đã thử nghiệm thành công trong RegexBuddy. Tuy nhiên, khi tôi sao chép nó như là hương vị Java String và dán nó vào mã Java, nó không hoạt động. Lớp sau in false:Cụm từ thông dụng để khớp URL trong Java

public class RegexFoo { 

    public static void main(String[] args) { 
     String regex = "\\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]"; 
     String text = "http://google.com"; 
     System.out.println(IsMatch(text,regex)); 
} 

    private static boolean IsMatch(String s, String pattern) { 
     try { 
      Pattern patt = Pattern.compile(pattern); 
      Matcher matcher = patt.matcher(s); 
      return matcher.matches(); 
     } catch (RuntimeException e) { 
     return false; 
    }  
} 
}

Có ai biết tôi đang làm gì sai không?

Nguồn

2008-10-02 Sergio del Amo

Sergio, đừng bắt RuntimeException. Nó có thể giới thiệu các lỗi tinh vi và là một thực hành xấu tổng thể. Nếu bạn chỉ muốn bỏ qua kịch bản khi biểu thức là bất hợp pháp, hãy sử dụng:} catch (PatternSyntaxException pse) {} để thay thế. Xem mục 57 của: http://java.sun.com/docs/books/effective/ – OscarRyz

Hoặc bạn có thể sử dụng Pattern patt = Pattern.compile (pattern, Pattern.CASE_INSENSITIVE); để tránh thay đổi regex để khớp cả chữ hoa và chữ thường. – jm4

Tôi biết rằng điều này thực sự cũ ('08), nhưng đối với bất kỳ ai có vấn đề tương tự, RegexBuddy có tab "Sử dụng". Trước hết hãy chắc chắn rằng bạn chọn hương vị Java 7, và sau đó trong bảng "Use", bạn có thể cho phép nó tạo mã Java cho trường hợp cụ thể của bạn. Điều này làm việc tốt cho tôi. –

Hãy thử chuỗi regex sau để thay thế. Thử nghiệm của bạn có thể được thực hiện theo cách phân biệt chữ hoa chữ thường. Tôi đã thêm các ký tự chữ thường cũng như một trình giữ chỗ bắt đầu chuỗi thích hợp.

String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

này hoạt động quá:

String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

Lưu ý:

String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com> 

String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>

Nguồn

2008-10-02 16:48:55 TomC

Sử dụng cụm từ thông dụng của bạn tôi cũng nhận được sai. –

Bạn có bắt được bản chỉnh sửa cuối cùng của tôi không. Tôi béo ngón tay bắt đầu của chuỗi. Tôi chỉ sao chép nó vào Eclipse và tôi nhận được "true". – TomC

cảm ơn người đàn ông, lần đầu tiên tôi thấy tiện ích cho các ý kiến trong stackoverflow –

Tôi sẽ cố gắng một tiêu chuẩn "Tại sao các bạn làm theo cách này?" câu trả lời ... Bạn có biết về java.net.URL không?

URL url = new URL(stringURL);

Ở trên sẽ ném một MalformedURLException nếu nó không thể phân tích cú pháp URL.

Nguồn

2008-10-02 16:53:38 billjamesdev

Tôi phải đi qua đường biểu thức thông thường. Những gì tôi đăng ở đây càng đơn giản càng tốt để làm cho câu hỏi của tôi rõ ràng. Trong chương trình của tôi, tôi đang sử dụng regex URL bên trong một regex phức tạp hơn. –

Thật tuyệt. Tôi không có câu trả lời tốt hơn, vì vậy tôi nghĩ tôi sẽ đăng một giải pháp thay thế. Tuy nhiên, tôi không nghĩ rằng tôi sẽ bị đánh giá thấp. – billjamesdev

bạn nói đúng, có thể bị đánh dấu là hơi nhiều một chút. "Tôi sẽ thử tiêu chuẩn" nghe có vẻ hơi khó chịu. –

này hoạt động quá:

String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

Lưu ý:

String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com> 

String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>

Vì vậy, có lẽ là một trở nên tiện lợi hơn để sử dụng chung.

Nguồn

2008-10-02 17:24:30

Khi sử dụng cụm từ thông dụng từ thư viện của RegexBuddy, hãy đảm bảo sử dụng cùng chế độ đối sánh trong mã của riêng bạn làm regex từ thư viện. Nếu bạn tạo đoạn mã nguồn trên tab Sử dụng, RegexBuddy sẽ tự động đặt các tùy chọn đối sánh chính xác trong đoạn mã nguồn. Nếu bạn sao chép/dán regex, bạn phải tự mình làm điều đó.

Trong trường hợp này, như những trường hợp khác đã chỉ ra, bạn đã bỏ lỡ tùy chọn không phân biệt chữ hoa chữ thường.

Nguồn

2008-11-09 15:11:59

Vấn đề với tất cả các phương pháp gợi ý: tất cả các RegEx là xác nhận

Tất cả RegEx dựa trên code được qua chế: nó sẽ tìm thấy URL chỉ có giá trị! Như một mẫu, nó sẽ bỏ qua bất cứ điều gì bắt đầu bằng "http: //" và có các ký tự không phải ASCII bên trong.

Hơn nữa: Tôi đã gặp phải 1-2 giây xử lý (đơn luồng, chuyên dụng) với gói Java RegEx (lọc địa chỉ email từ văn bản) cho các câu rất nhỏ và đơn giản, không có gì cụ thể; có thể lỗi trong Java 6 RegEx ...

Giải pháp đơn giản/nhanh nhất là sử dụng StringTokenizer để phân tách văn bản thành thẻ, để loại bỏ mã thông báo bắt đầu bằng "http: //" v.v.

Nếu bạn muốn lọc email từ văn bản (vì sau này bạn sẽ làm NLP nhân viên vv) - chỉ cần xóa tất cả các thẻ có chứa "@" bên trong.

Đây là văn bản đơn giản trong đó RegEx của Java 6 không thành công. Hãy thử nó trong các biến thể mơ hồ của Java. Phải mất khoảng 1000 mili giây cho mỗi cuộc gọi RegEx, trong một ứng dụng thử nghiệm đơn ren dài chạy:

pattern = Pattern.compile("[A-Za-z0-9](([_\\.\\-]?[a-zA-Z0-9]+)*)@([A-Za-z0-9]+)(([\\.\\-]?[a-zA-Z0-9]+)*)\\.([A-Za-z]{2,})", Pattern.CASE_INSENSITIVE); 

"Avalanna is such a sweet little girl! It would b heartbreaking if cancer won. She's so precious! #BeliebersPrayForAvalanna"); 
"@AndySamuels31 Hahahahahahahahahhaha lol, you don't look like a girl hahahahhaahaha, you are... sexy.";

Đừng dựa vào biểu thức thông thường nếu bạn chỉ cần phải lọc từ với "@", "http: //", "ftp: //", "mailto:"; nó là chi phí kỹ thuật khổng lồ.

Nếu bạn thực sự muốn sử dụng RegEx với Java, hãy thử Automaton

Nguồn

2013-01-20 15:23:46

Lol. Automaton không hỗ trợ các nhóm chụp. – user1050755

Tôi không nhận được sự quan tâm của bạn. Regex của câu trả lời được chấp nhận thực tế hoạt động tốt để xác thực URL. Có vẻ như bạn đã bỏ qua nó, nói rằng 'nó sẽ chỉ tìm thấy các URL hợp lệ!' - đó là mục tiêu của câu hỏi của OP. Tui bỏ lỡ điều gì vậy? – mmcrae

Cách tốt nhất để làm điều đó bây giờ là:

android.util.Patterns.WEB_URL.matcher(linkUrl).matches();

EDIT: Mã của Patterns từ https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java:

/* 
* Copyright (C) 2007 The Android Open Source Project 
* 
* Licensed under the Apache License, Version 2.0 (the "License"); 
* you may not use this file except in compliance with the License. 
* You may obtain a copy of the License at 
* 
*  http://www.apache.org/licenses/LICENSE-2.0 
* 
* Unless required by applicable law or agreed to in writing, software 
* distributed under the License is distributed on an "AS IS" BASIS, 
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
* See the License for the specific language governing permissions and 
* limitations under the License. 
*/ 

package android.util; 

import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

/** 
* Commonly used regular expression patterns. 
*/ 
public class Patterns { 
    /** 
    * Regular expression to match all IANA top-level domains. 
    * List accurate as of 2011/07/18. List taken from: 
    * http://data.iana.org/TLD/tlds-alpha-by-domain.txt 
    * This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py 
    * 
    * @deprecated Due to the recent profileration of gTLDs, this API is 
    * expected to become out-of-date very quickly. Therefore it is now 
    * deprecated. 
    */ 
    @Deprecated 
    public static final String TOP_LEVEL_DOMAIN_STR = 
     "((aero|arpa|asia|a[cdefgilmnoqrstuwxz])" 
     + "|(biz|b[abdefghijmnorstvwyz])" 
     + "|(cat|com|coop|c[acdfghiklmnoruvxyz])" 
     + "|d[ejkmoz]" 
     + "|(edu|e[cegrstu])" 
     + "|f[ijkmor]" 
     + "|(gov|g[abdefghilmnpqrstuwy])" 
     + "|h[kmnrtu]" 
     + "|(info|int|i[delmnoqrst])" 
     + "|(jobs|j[emop])" 
     + "|k[eghimnprwyz]" 
     + "|l[abcikrstuvy]" 
     + "|(mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])" 
     + "|(name|net|n[acefgilopruz])" 
     + "|(org|om)" 
     + "|(pro|p[aefghklmnrstwy])" 
     + "|qa" 
     + "|r[eosuw]" 
     + "|s[abcdeghijklmnortuvyz]" 
     + "|(tel|travel|t[cdfghjklmnoprtvwz])" 
     + "|u[agksyz]" 
     + "|v[aceginu]" 
     + "|w[fs]" 
     + "|(\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\\-\\-0zwm56d|xn\\-\\-11b5bs3a9aj6g|xn\\-\\-3e0b707e|xn\\-\\-45brj9c|xn\\-\\-80akhbyknj4f|xn\\-\\-90a3ac|xn\\-\\-9t4b11yi5a|xn\\-\\-clchc0ea0b2g2a9gcd|xn\\-\\-deba0ad|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-g6w251d|xn\\-\\-gecrj9c|xn\\-\\-h2brj9c|xn\\-\\-hgbk6aj7f53bba|xn\\-\\-hlcj6aya9esc7a|xn\\-\\-j6w193g|xn\\-\\-jxalpdlp|xn\\-\\-kgbechtv|xn\\-\\-kprw13d|xn\\-\\-kpry57d|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1ai|xn\\-\\-pgbs0dh|xn\\-\\-s9brj9c|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-yfro4i67o|xn\\-\\-ygbi2ammx|xn\\-\\-zckzah|xxx)" 
     + "|y[et]" 
     + "|z[amw])"; 

    /** 
    * Regular expression pattern to match all IANA top-level domains. 
    * @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}. 
    */ 
    @Deprecated 
    public static final Pattern TOP_LEVEL_DOMAIN = 
     Pattern.compile(TOP_LEVEL_DOMAIN_STR); 

    /** 
    * Regular expression to match all IANA top-level domains for WEB_URL. 
    * List accurate as of 2011/07/18. List taken from: 
    * http://data.iana.org/TLD/tlds-alpha-by-domain.txt 
    * This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py 
    * 
    * @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}. 
    */ 
    @Deprecated 
    public static final String TOP_LEVEL_DOMAIN_STR_FOR_WEB_URL = 
     "(?:" 
     + "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])" 
     + "|(?:biz|b[abdefghijmnorstvwyz])" 
     + "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])" 
     + "|d[ejkmoz]" 
     + "|(?:edu|e[cegrstu])" 
     + "|f[ijkmor]" 
     + "|(?:gov|g[abdefghilmnpqrstuwy])" 
     + "|h[kmnrtu]" 
     + "|(?:info|int|i[delmnoqrst])" 
     + "|(?:jobs|j[emop])" 
     + "|k[eghimnprwyz]" 
     + "|l[abcikrstuvy]" 
     + "|(?:mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])" 
     + "|(?:name|net|n[acefgilopruz])" 
     + "|(?:org|om)" 
     + "|(?:pro|p[aefghklmnrstwy])" 
     + "|qa" 
     + "|r[eosuw]" 
     + "|s[abcdeghijklmnortuvyz]" 
     + "|(?:tel|travel|t[cdfghjklmnoprtvwz])" 
     + "|u[agksyz]" 
     + "|v[aceginu]" 
     + "|w[fs]" 
     + "|(?:\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\\-\\-0zwm56d|xn\\-\\-11b5bs3a9aj6g|xn\\-\\-3e0b707e|xn\\-\\-45brj9c|xn\\-\\-80akhbyknj4f|xn\\-\\-90a3ac|xn\\-\\-9t4b11yi5a|xn\\-\\-clchc0ea0b2g2a9gcd|xn\\-\\-deba0ad|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-g6w251d|xn\\-\\-gecrj9c|xn\\-\\-h2brj9c|xn\\-\\-hgbk6aj7f53bba|xn\\-\\-hlcj6aya9esc7a|xn\\-\\-j6w193g|xn\\-\\-jxalpdlp|xn\\-\\-kgbechtv|xn\\-\\-kprw13d|xn\\-\\-kpry57d|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1ai|xn\\-\\-pgbs0dh|xn\\-\\-s9brj9c|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-yfro4i67o|xn\\-\\-ygbi2ammx|xn\\-\\-zckzah|xxx)" 
     + "|y[et]" 
     + "|z[amw]))"; 

    /** 
    * Good characters for Internationalized Resource Identifiers (IRI). 
    * This comprises most common used Unicode characters allowed in IRI 
    * as detailed in RFC 3987. 
    * Specifically, those two byte Unicode characters are not included. 
    */ 
    public static final String GOOD_IRI_CHAR = 
     "a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF"; 

    public static final Pattern IP_ADDRESS 
     = Pattern.compile(
      "((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(25[0-5]|2[0-4]" 
      + "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(25[0-5]|2[0-4][0-9]|[0-1]" 
      + "[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}" 
      + "|[1-9][0-9]|[0-9]))"); 

    /** 
    * RFC 1035 Section 2.3.4 limits the labels to a maximum 63 octets. 
    */ 
    private static final String IRI 
     = "[" + GOOD_IRI_CHAR + "]([" + GOOD_IRI_CHAR + "\\-]{0,61}[" + GOOD_IRI_CHAR + "]){0,1}"; 

    private static final String GOOD_GTLD_CHAR = 
     "a-zA-Z\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF"; 
    private static final String GTLD = "[" + GOOD_GTLD_CHAR + "]{2,63}"; 
    private static final String HOST_NAME = "(" + IRI + "\\.)+" + GTLD; 

    public static final Pattern DOMAIN_NAME 
     = Pattern.compile("(" + HOST_NAME + "|" + IP_ADDRESS + ")"); 

    /** 
    * Regular expression pattern to match most part of RFC 3987 
    * Internationalized URLs, aka IRIs. Commonly used Unicode characters are 
    * added. 
    */ 
    public static final Pattern WEB_URL = Pattern.compile(
     "((?:(http|https|Http|Https|rtsp|Rtsp):\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)" 
     + "\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_" 
     + "\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?" 
     + "(?:" + DOMAIN_NAME + ")" 
     + "(?:\\:\\d{1,5})?)" // plus option port number 
     + "(\\/(?:(?:[" + GOOD_IRI_CHAR + "\\;\\/\\?\\:\\@\\&\\=\\#\\~" // plus option query params 
     + "\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?" 
     + "(?:\\b|$)"); // and finally, a word boundary or end of 
         // input. This is to stop foo.sure from 
         // matching as foo.su 

    public static final Pattern EMAIL_ADDRESS 
     = Pattern.compile(
      "[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" + 
      "\\@" + 
      "[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" + 
      "(" + 
       "\\." + 
       "[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" + 
      ")+" 
     ); 

    /** 
    * This pattern is intended for searching for things that look like they 
    * might be phone numbers in arbitrary text, not for validating whether 
    * something is in fact a phone number. It will miss many things that 
    * are legitimate phone numbers. 
    * 
    * <p> The pattern matches the following: 
    * <ul> 
    * <li>Optionally, a + sign followed immediately by one or more digits. Spaces, dots, or dashes 
    * may follow. 
    * <li>Optionally, sets of digits in parentheses, separated by spaces, dots, or dashes. 
    * <li>A string starting and ending with a digit, containing digits, spaces, dots, and/or dashes. 
    * </ul> 
    */ 
    public static final Pattern PHONE 
     = Pattern.compile(      // sdd = space, dot, or dash 
       "(\\+[0-9]+[\\- \\.]*)?"  // +<digits><sdd>* 
       + "(\\([0-9]+\\)[\\- \\.]*)?" // (<digits>)<sdd>* 
       + "([0-9][0-9\\- \\.]+[0-9])"); // <digit><digit|sdd>+<digit> 

    /** 
    * Convenience method to take all of the non-null matching groups in a 
    * regex Matcher and return them as a concatenated string. 
    * 
    * @param matcher  The Matcher object from which grouped text will 
    *      be extracted 
    * 
    * @return    A String comprising all of the non-null matched 
    *      groups concatenated together 
    */ 
    public static final String concatGroups(Matcher matcher) { 
     StringBuilder b = new StringBuilder(); 
     final int numGroups = matcher.groupCount(); 

     for (int i = 1; i <= numGroups; i++) { 
      String s = matcher.group(i); 

      if (s != null) { 
       b.append(s); 
      } 
     } 

     return b.toString(); 
    } 

    /** 
    * Convenience method to return only the digits and plus signs 
    * in the matching string. 
    * 
    * @param matcher  The Matcher object from which digits and plus will 
    *      be extracted 
    * 
    * @return    A String comprising all of the digits and plus in 
    *      the match 
    */ 
    public static final String digitsAndPlusOnly(Matcher matcher) { 
     StringBuilder buffer = new StringBuilder(); 
     String matchingRegion = matcher.group(); 

     for (int i = 0, size = matchingRegion.length(); i < size; i++) { 
      char character = matchingRegion.charAt(i); 

      if (character == '+' || Character.isDigit(character)) { 
       buffer.append(character); 
      } 
     } 
     return buffer.toString(); 
    } 

    /** 
    * Do not create this static utility class. 
    */ 
    private Patterns() {} 
}

Nguồn

2013-09-20 11:18:08 squixy

+1 dành cho bạn! Cảm ơn bạn rất nhiều!!! Đây là mã tuyệt vời! Tất cả mọi người đang cố gắng này với regex khó khăn trong khi nó có thể được điều này dễ dàng. Tuyệt vời! –

Đây phải là câu trả lời đúng – JPM

@JPM Ngoại trừ OP đã tìm kiếm giải pháp Java không phải là một giải pháp cụ thể cho Android (dễ quên xem các thẻ cụ thể cho Q). Tuy nhiên, một điều tốt cho những người làm mã cho Android để biết về vì vậy tôi upped. – indivisible

Phù hợp với câu trả lời billjamesdev, đây là một cách tiếp cận khác để xác thực URL mà không sử dụng RegEx:

Từ Apache Commons Validator lib, hãy xem lớp UrlValidator. Một số mã ví dụ:

Tạo UrlValidator với các lược đồ hợp lệ là "http" và "https".

String[] schemes = {"http","https"}. 
UrlValidator urlValidator = new UrlValidator(schemes); 
if (urlValidator.isValid("ftp://foo.bar.com/")) { 
    System.out.println("url is valid"); 
} else { 
    System.out.println("url is invalid"); 
} 

prints "url is invalid"

Nếu thay vào đó hàm tạo mặc định được sử dụng.

UrlValidator urlValidator = new UrlValidator(); 
if (urlValidator.isValid("ftp://foo.bar.com/")) { 
    System.out.println("url is valid"); 
} else { 
    System.out.println("url is invalid"); 
}

in ra "url là hợp lệ"

Nguồn

2016-05-06 09:58:29 Cavaleiro

Cụm từ thông dụng để khớp URL trong Java

Trả lời

Các vấn đề liên quan