preg_match_all

(PHP 4, PHP 5, PHP 7, PHP 8)

preg_match_all — 執行全域正規表達式匹配

描述

preg_match_all(
    string $pattern,
    string $subject,
    array &$matches = null,
    int $flags = 0,
    int $offset = 0
): int|false

在 subject 中搜尋所有符合 pattern 中給定的正規表達式之匹配項，並將它們放入 matches 中，其順序由 flags 指定。

在找到第一個匹配項後，後續搜尋會從上一個匹配項的結尾繼續。

參數

pattern

要搜尋的模式，為字串。

subject

輸入字串。

matches

根據 flags 排序的多維陣列中所有匹配項的陣列。

flags

可以是下列旗標的組合（請注意，將 PREG_PATTERN_ORDER 與 PREG_SET_ORDER 一起使用是沒有意義的）

PREG_PATTERN_ORDER

依此排序結果：$matches[0] 是完整模式匹配的陣列，$matches[1] 是由第一個括號中的子模式匹配的字串陣列，依此類推。

<?php
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",
 "<b>example: </b><div align=left>this is a test</div>",
 $out, PREG_PATTERN_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>

上述範例會輸出

<b>example: </b>, <div align=left>this is a test</div>
example: , this is a test

因此，$out[0] 包含一個符合完整模式的字串陣列，而 $out[1] 包含一個由標籤括住的字串陣列。

如果模式包含具名的子模式，$matches 還會包含以子模式名稱作為鍵的項目。

如果模式包含重複的具名子模式，只有最右邊的子模式會儲存在 $matches[NAME] 中。

<?php
preg_match_all(
 '/(?J)(?<match>foo)|(?<match>bar)/',
 'foo bar',
 $matches,
 PREG_PATTERN_ORDER
);
print_r($matches['match']);
?>

上述範例會輸出

Array
(
    [0] => 
    [1] => bar
)

PREG_SET_ORDER

依此排序結果：$matches[0] 是第一組匹配項的陣列，$matches[1] 是第二組匹配項的陣列，依此類推。

<?php
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",
 "<b>example: </b><div align=\"left\">this is a test</div>",
 $out, PREG_SET_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>

上述範例會輸出

<b>example: </b>, example:
<div align="left">this is a test</div>, this is a test

PREG_OFFSET_CAPTURE

如果傳遞此旗標，則對於每個發生的匹配項，也會傳回附加的字串偏移量（以位元組為單位）。請注意，這會將 matches 的值變更為陣列的陣列，其中每個元素都是由偏移量 0 處的匹配字串和偏移量 1 處的 subject 中的字串偏移量組成的陣列。

<?php
preg_match_all('/(foo)(bar)(baz)/', 'foobarbaz', $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

上述範例會輸出

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => foobarbaz
                    [1] => 0
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => foo
                    [1] => 0
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [0] => bar
                    [1] => 3
                )

        )

    [3] => Array
        (
            [0] => Array
                (
                    [0] => baz
                    [1] => 6
                )

        )

)

PREG_UNMATCHED_AS_NULL

如果傳遞此旗標，則未匹配的子模式會報告為 null；否則，它們會報告為空字串。

如果未給定排序旗標，則會假設為 PREG_PATTERN_ORDER。

offset

通常，搜尋會從主旨字串的開頭開始。選擇性參數 offset 可用於指定從其開始搜尋的替代位置（以位元組為單位）。

注意:
使用 offset 不等同於將 substr($subject, $offset) 傳遞給 preg_match_all() 以取代主旨字串，因為 pattern 可能包含斷言，例如 ^、$ 或 (?<=x)。請參閱 preg_match() 以取得範例。

傳回值

傳回完整模式匹配的次數（可能為零），或在失敗時傳回 false。

錯誤/例外

如果傳遞的正規表達式模式無法編譯成有效的正規表達式，則會發出 E_WARNING。

更新紀錄

版本	描述
7.2.0	現在 `$flags` 參數支援 `PREG_UNMATCHED_AS_NULL`。

範例

範例 1 從某些文字中取得所有電話號碼。

<?php
preg_match_all("/\(? (\d{3})? \)? (?(1) [\-\s] ) \d{3}-\d{4}/x",
 "Call 555-1212 or 1-800-555-1212", $phones);
?>

範例 2 尋找匹配的 HTML 標籤（貪婪）

<?php
// \\2 是一個反向參照的範例。這告訴 pcre，
// 它必須匹配正規表示式中第二組括號的內容，
// 在這個例子中會是 ([\w]+)。額外的反斜線是必要的，
// 因為字串是用雙引號包住的。
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
 echo "matched: " . $val[0] . "\n";
 echo "part 1: " . $val[1] . "\n";
 echo "part 2: " . $val[2] . "\n";
 echo "part 3: " . $val[3] . "\n";
 echo "part 4: " . $val[4] . "\n\n";
}
?>

上述範例會輸出

matched: <b>bold text</b>
part 1: <b>
part 2: b
part 3: bold text
part 4: </b>

matched: <a href=howdy.html>click me</a>
part 1: <a href=howdy.html>
part 2: a
part 3: click me
part 4: </a>

範例 #3 使用具名子模式

<?php

$str = <<<FOO
a: 1
b: 2
c: 3
FOO;

preg_match_all('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

/* 替代方案 */
// preg_match_all('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);

print_r($matches);

?>

上述範例會輸出

Array
(
    [0] => Array
        (
            [0] => a: 1
            [1] => b: 2
            [2] => c: 3
        )

    [name] => Array
        (
            [0] => a
            [1] => b
            [2] => c
        )

    [1] => Array
        (
            [0] => a
            [1] => b
            [2] => c
        )

    [digit] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

    [2] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

)

另請參閱

PCRE 模式
preg_quote() - 引用正規表示式字元
preg_match() - 執行正規表示式匹配
preg_replace() - 執行正規表示式搜尋和取代
preg_split() - 使用正規表示式分割字串
preg_last_error() - 返回上次 PCRE 正規表示式執行的錯誤代碼

發現問題？

了解如何改進此頁面 • 提交 Pull Request • 回報錯誤

＋新增筆記

使用者貢獻的筆記 38 筆筆記

向上

向下

buuh ¶

13 年前

如果您想從字串中擷取所有 {token}s

<?php
$pattern = "/{[^}]*}/";
$subject = "{token1} foo {token2} bar";
preg_match_all($pattern, $subject, $matches);
print_r($matches);
?>

輸出

陣列
(
[0] => 陣列
        (
[0] => {token1}
[1] => {token2}
        )

)

向上

向下

harrybarrow at mail dot ru ¶

3 年前

preg_match_all() 和其他 preg_*() 函數對於非常長的字串（至少超過 1MB）無法正常運作。
在這種情況下，函數會返回 FALSE 且 $matchers 的值是不可預測的，可能包含一些值，也可能是空的。
在這種情況下，解決方法是將長字串預先分割成多個部分，例如使用 explode() 依據某些條件分割長字串，然後在每個部分上應用 preg_match_all()。
這種情況的典型情境是使用正規表示式進行日誌分析。
在 PHP 7.2.0 上測試過

向上

向下

mnc at u dot nu ¶

18 年前

即使您正在使用 unicode /u 修飾符，PREG_OFFSET_CAPTURE 看起來總是提供位元組偏移量，而不是字元位置偏移量。

向上

向下

stas kuryan aka stafox ¶

9 年前

這裡有一個很棒的線上正規表示式編輯器 https://regex101.com/
它可以幫助您測試正規表示式（prce、js、python），並在資料輸入時即時突出顯示正規表示式匹配。

向上

向下

Daniel Klein ¶

9 年前

john at mccarthy dot net 發布的程式碼不是必要的。如果您希望結果依個別匹配分組，只需使用

<?
preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);
?>

例如

<?
preg_match_all('/([GH])([12])([!?])/', 'G1? H2!', $matches); // 預設 PREG_PATTERN_ORDER
// $matches = array(0 => array(0 => 'G1?', 1 => 'H2!'),
// 1 => array(0 => 'G', 1 => 'H'),
// 2 => array(0 => '1', 1 => '2'),
// 3 => array(0 => '?', 1 => '!'))

preg_match_all('/([GH])([12])([!?])/', 'G1? H2!', $matches, PREG_SET_ORDER);
// $matches = array(0 => array(0 => 'G1?', 1 => 'G', 2 => '1', 3 => '?'),
// 1 => array(0 => 'H2!', 1 => 'H', 2 => '2', 3 => '!'))
?>

向上

向下

fab ¶

12 年前

這裡有一個函數，可以將字串中所有出現的數字替換為數字減一

<?php
function decremente_chaine($chaine)
 {
 //取得所有數字的出現次數及其索引
 preg_match_all("/[0-9]+/",$chaine,$out,PREG_OFFSET_CAPTURE);
 //遍歷出現的次數 
 for($i=0;$i<sizeof($out[0]);$i++)
 {
 $longueurnombre = strlen((string)$out[0][$i][0]);
 $taillechaine = strlen($chaine);
 // 將字串分割成 3 個部分
 $debut = substr($chaine,0,$out[0][$i][1]);
 $milieu = ($out[0][$i][0])-1;
 $fin = substr($chaine,$out[0][$i][1]+$longueurnombre,$taillechaine);
 // 如果是 10、100、1000 等，我們將所有內容減 1，因為結果會少一位數字
 if(preg_match('#[1][0]+$#', $out[0][$i][0]))
 {
 for($j = $i+1;$j<sizeof($out[0]);$j++)
 {
 $out[0][$j][1] = $out[0][$j][1] -1;
 }
 }
 $chaine = $debut.$milieu.$fin;
 }
 return $chaine;
 }
?>

向上

向下

bruha ¶

16 年前

要計算 UTF-8 字串中的 str_length，我使用

$count = preg_match_all("/[[:print:]\pL]/u", $str, $pockets);

其中
[:print:] - 列印字元，包括空格
\pL - UTF-8 字母
/u - UTF-8 字串
其他 Unicode 字元屬性請參閱 http://www.pcre.org/pcre.txt

向上

向下

phpnet at sinful-music dot com ¶

18 年前

這是一些鬆散的程式碼，用來 1. 驗證地址列表是否符合 RCF2822 標準，以及 2. 提取地址規格（一般稱為「email」的部分）。我不會建議將它用於輸入表單的 email 檢查，但它可能正好符合您其他 email 應用程式的需求。我知道它可以進一步最佳化，但那部分就留給你們這些破解高手了。產生的 Regex 總長度約為 30000 位元組。這是因為它接受註解。您可以將 $cfws 設定為 $fws 來移除註解，這樣它會縮小到約 6000 位元組。一致性檢查絕對且嚴格地參考 RFC2822。玩得開心，如果有任何增強功能，請 email 給我！

<?php
function mime_extract_rfc2822_address($string)
{
 // rfc2822 符號設定
 $crlf = "(?:\r\n)";
 $wsp = "[\t ]";
 $text = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";
 $quoted_pair = "(?:\\\\$text)";
 $fws = "(?:(?:$wsp*$crlf)?$wsp+)";
 $ctext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" .
 "!-'*-[\\]-\\x7F]";
 $comment = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" .
 "$fws?\\))";
 $cfws = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))";
 //$cfws = $fws; //註解的替代方案
 $atext = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]";
 $atom = "(?:$cfws?$atext+$cfws?)";
 $dot_atom_text = "(?:$atext+(?:\\.$atext+)*)";
 $dot_atom = "(?:$cfws?$dot_atom_text$cfws?)";
 $qtext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]";
 $qcontent = "(?:$qtext|$quoted_pair)";
 $quoted_string = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)";
 $dtext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]";
 $dcontent = "(?:$dtext|$quoted_pair)";
 $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)";
 $domain = "(?:$dot_atom|$domain_literal)";
 $local_part = "(?:$dot_atom|$quoted_string)";
 $addr_spec = "($local_part@$domain)";
 $display_name = "(?:(?:$atom|$quoted_string)+)";
 $angle_addr = "(?:$cfws?<$addr_spec>$cfws?)";
 $name_addr = "(?:$display_name?$angle_addr)";
 $mailbox = "(?:$name_addr|$addr_spec)";
 $mailbox_list = "(?:(?:(?:(?<=:)|,)$mailbox)+)";
 $group = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)";
 $address = "(?:$mailbox|$group)";
 $address_list = "(?:(?:^|,)$address)+";

 //輸出字串長度（只是為了讓您了解它有多長）
 echo(strlen($address_list) . " ");

 //套用表達式
 preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER);

 return $array;
};
?>

向上

向下

chuckie ¶

17 年前

這是一個將位元組偏移量轉換為（UTF-8）字元偏移量的函數（這與您是否使用 /u 修飾符無關）

<?php

function mb_preg_match_all($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = PREG_PATTERN_ORDER, $pn_offset = 0, $ps_encoding = NULL) {
 // 警告！ - 這個函數所做的只是校正偏移量，沒有其他作用：
 //
 if (is_null($ps_encoding))
 $ps_encoding = mb_internal_encoding();

 $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));
 $ret = preg_match_all($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);

 if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))
 foreach($pa_matches as &$ha_match)
 foreach($ha_match as &$ha_match)
 $ha_match[1] = mb_strlen(substr($ps_subject, 0, $ha_match[1]), $ps_encoding);
 //
 //（程式碼與 PREG_PATTER_ORDER / PREG_SET_ORDER 無關）

 return $ret;
 }

?>

向上

向下

spambegone at cratemedia dot com ¶

16 年前

我發現 simpleXML 只有在 XML 非常小的情況下才有用，否則伺服器會耗盡記憶體（我懷疑有記憶體洩漏之類的現象？）。因此，在尋找替代解析器時，我決定嘗試一種更簡單的方法。我不知道這種方法與 CPU 使用率相比如何，但我知道它適用於大型 XML 結構。這比較像是一種手動方法，但它對我來說很有效，因為我總是知道我將接收的資料結構。


基本上，我只是使用 preg_match() 來尋找我正在尋找的值的唯一節點，或者我使用 preg_match_all 來尋找多個節點。這會將結果放入陣列中，然後我可以隨意處理這些資料。


然而，我不滿意 preg_match_all() 將資料儲存兩次（需要兩倍的記憶體），一個陣列用於所有完整模式比對，另一個陣列用於所有子模式比對。您或許可以編寫自己的函數來克服這個問題。但目前這對我來說很有效，而且我希望它也能為其他人節省一些時間。


// XML 範例
<RETS ReplyCode="0" ReplyText="Operation Successful">
<COUNT Records="14" />
<DELIMITER value="09" />
<COLUMNS>PropertyID</COLUMNS>
<DATA>521897</DATA>
<DATA>677208</DATA>
<DATA>686037</DATA>
</RETS>


<?PHP 
 
// 範例函式 
function parse_xml($xml) { 
 
 
 // 取得分隔符號 (單一實例) 
 $match_res = preg_match('/<DELIMITER value ?= ?"(.*)" ?\/>/', $xml, $matches); 
 if(!empty($matches[1])) { 
 $results["delimiter"] = chr($matches[1]); 
 } else { 
 // 預設分隔符號 
 $results["delimiter"] = "\t"; 
 } 
 unset($match_res, $matches); 
 
 
 // 取得多個資料節點 (多個實例) 
 $results["data_count"] = preg_match_all("/<DATA>(.*)<\/DATA>/", $xml, $matches); 
 // 取得子模式的匹配結果，捨棄其餘 
 $results["data"]=$matches[1]; 
 unset($match_res, $matches); 
 
 // 釋放 XML 以節省記憶體 (也應該在函式外釋放) 
 unset($xml); 
 
 // 回傳結果陣列 
 return $results; 
 
 
} 
 
?>

向上

向下

meaneye at mail dot com ¶

16 年前

最近我必須用希伯來文寫一個搜尋引擎，並遇到了大量的問題。我的資料儲存在使用 utf8_bin 編碼的 MySQL 資料表中。

所以，為了能夠在 utf8 資料表中寫入希伯來文，你需要做以下操作
<?php
$prepared_text = addslashes(urf8_encode($text));
?>

但是接著我必須找出儲存的文字中是否存在某些單字。這是我卡住的地方。簡單的 preg_match 無法找到文字，因為希伯來文沒那麼簡單。我嘗試了 /u 和其他各種方法。

解決方案在某種程度上是合乎邏輯且簡單的...
<?php
$db_text = bin2hex(stripslashes(utf8_decode($db_text)));
$word = bin2hex($word);

$found = preg_match_all("/($word)+/i", $db_text, $matches);
?>

我使用了 preg_match_all，因為它會回傳出現次數。所以我可以根據次數來排序搜尋結果。

希望有人覺得這很有用！

向上

向下

john at mccarthy dot net ¶

13 年前

我需要一個函式來旋轉 preg_match_all 查詢的結果，並寫了這個。不確定是否已經存在。


<?php 
function turn_array($m) 
{ 
 for ($z = 0;$z < count($m);$z++) 
 { 
 for ($x = 0;$x < count($m[$z]);$x++) 
 { 
 $rt[$x][$z] = $m[$z][$x]; 
 } 
 } 
 
 return $rt; 
} 
?> 

範例 - 取得一些 preg_match_all 查詢的結果


陣列
(

[0] => 陣列
        (

[1] => Banff
[2] => Canmore
[3] => Invermere
        )

 

[1] => 陣列
        (

[1] => AB
[2] => AB
[3] => BC
        )

 

[2] => 陣列
        (

            [1] => 51.1746254 

            [2] => 51.0938416

            [3] => 50.5065193

        )

 

[3] => 陣列
        (

            [1] => -115.5719757 

            [2] => -115.3517761

            [3] => -116.0321884

        )

 

[4] => 陣列
        (

[1] => T1L 1B3
[2] => T1W 1N2
[3] => V0B 2G0
        )



)



將其旋轉 90 度以將結果分組為記錄


陣列
(

[0] => 陣列
        (

[1] => Banff
[2] => AB
            [3] => 51.1746254

            [4] => -115.5719757

[5] => T1L 1B3
        )

 

[1] => 陣列
        (

[1] => Canmore
[2] => AB
            [3] => 51.0938416

            [4] => -115.3517761

[5] => T1W 1N2
        )

 

[2] => 陣列
        (

[1] => Invermere
[2] => BC
            [3] => 50.5065193

            [4] => -116.0321884

[5] => V0B 2G0
        )

)

向上

向下

stamster at gmail dot com ¶

8 年前

在 preg_match_* 函式中使用此模式匹配和大型輸入緩衝區時要小心。

<?php
$pattern = '/\{(?:[^{}]|(?R))*\}/';

preg_match_all($pattern, $buffer, $matches); 
?>

如果 $buffer 的大小為 80+ KB，您最終會遇到段錯誤！

[89396.588854] php[4384]: segfault at 7ffd6e2bdeb0 ip 00007fa20c8d67ed sp 00007ffd6e2bde70 error 6 in libpcre.so.3.13.1[7fa20c8c3000+3c000]

這是由於 PCRE 遞迴造成的。這是 PHP 自 2008 年以來已知的錯誤，但其根源不是 PHP 本身，而是 PCRE 函式庫。

Rasmus Lerdorf 有答案：https://bugs.php.net/bug.php?id=45735#1365812629

"這裡的問題在於，在不造成巨大的效能和記憶體損失的情況下，沒有辦法偵測失控的正規表示式。
是的，我們可以以一種不會造成段錯誤的方式建置 PCRE，並且我們可以將預設的回溯限制提高
到一個非常大的值，但這樣會讓每次正規表示式的呼叫都變慢很多。如果 PCRE
提供一種更優雅的方式來處理這個問題，而不會影響效能
我們當然會使用它。"

向上

向下

ad ¶

15 年前

我編寫了一個簡單的函式來從字串中提取數字。


我不確定它有多好，但它確實有效。


它只會取得數字 0-9、"-"、" "、"("、")"、"."


字元。據我所知，這些是電話號碼最常用的字元。


<?php 
function clean_phone_number($phone) { 
 if (!empty($phone)) { 
 //var_dump($phone); 
 preg_match_all('/[0-9\(\)+.\- ]/s', $phone, $cleaned); 
 foreach($cleaned[0] as $k=>$v) { 
 $ready .= $v; 
 } 
 var_dump($ready); 
 die; 
 if (mb_strlen($cleaned) > 4 && mb_strlen($cleaned) <=25) { 
 return $cleaned; 
 } 
 else { 
 return false; 
 } 
 } 
 return false; 
} 
?>

向上

向下

marc ¶

12 年前

最好使用 preg_replace 將文字轉換為帶有 <a> 標籤的可點擊連結。

$html = preg_replace('"\b(http://\S+)"', '<a href="$1">$1</a>', $text);

向上

向下

no at bo dot dy ¶

14 年前

對於解析具有實體的查詢，請使用


<?php 
preg_match_all("/(?:^|(?<=\&(?![a-z]+\;)))([^\=]+)=(.*?)(?:$|\&(?![a-z]+\;))/i", 
 $s, $m, PREG_SET_ORDER ); 
?>

向上

向下

sledge NOSPAM ¶

16 年前

您可能想要找到所有錨點標籤的位置。這將回傳一個二維陣列，其中會回傳起始和結束位置。

<?php
function getTagPositions($strBody)
{
 define(DEBUG, false);
 define(DEBUG_FILE_PREFIX, "/tmp/findlinks_");
 
 preg_match_all("/<[^>]+>(.*)<\/[^>]+>/U", $strBody, $strTag, PREG_PATTERN_ORDER);
 $intOffset = 0;
 $intIndex = 0;
 $intTagPositions = array();

 foreach($strTag[0] as $strFullTag) {
 if(DEBUG == true) {
 $fhDebug = fopen(DEBUG_FILE_PREFIX.time(), "a");
 fwrite($fhDebug, $fulltag."\n");
 fwrite($fhDebug, "Starting position: ".strpos($strBody, $strFullTag, $intOffset)."\n");
 fwrite($fhDebug, "Ending position: ".(strpos($strBody, $strFullTag, $intOffset) + strlen($strFullTag))."\n");
 fwrite($fhDebug, "Length: ".strlen($strFullTag)."\n\n");
 fclose($fhDebug);
 }
 $intTagPositions[$intIndex] = array('start' => (strpos($strBody, $strFullTag, $intOffset)), 'end' => (strpos($strBody, $strFullTag, $intOffset) + strlen($strFullTag)));
 $intOffset += strlen($strFullTag);
 $intIndex++;
 }
 return $intTagPositions;
}

$strBody = 'I have lots of <a href="http://my.site.com">links</a> on this <a href="http://my.site.com">page</a> that I want to <a href="http://my.site.com">find</a> the positions.';

$strBody = strip_tags(html_entity_decode($strBody), '<a>');
$intTagPositions = getTagPositions($strBody);
print_r($intTagPositions);

/*****
Output:

Array ( 
 [0] => Array ( 
 [start] => 15 
 [end] => 53 ) 
 [1] => Array ( 
 [start] => 62 
 [end] => 99 ) 
 [2] => Array ( 
 [start] => 115 
 [end] => 152 )
 ) 
*****/
?>

向上

向下

loretoparisi at gmail dot com ¶

1 年前

一個多位元組安全的 preg_match_all 函式，修正在 UTF-8 字串上使用 PREG_OFFSET_CAPTURE 時的擷取偏移量
 
<?php 
function mb_preg_match_all($pattern, $subject, &$matches = null, $flags = 0, $offset = 0) {
 $out=preg_match_all($pattern, $subject, $matches, $flags, $offset);
 if($flags & PREG_OFFSET_CAPTURE && is_array($matches) && count($matches)>0) {
 foreach ($matches[0] as &$match) {
 $match[1] = mb_strlen(substr($subject, 0, $match[1]));
 }
 }
 return $out;
}
?>

向上

向下

biziclop at vipmail dot hu ¶

2 年前

有時候你不僅想要挑選符合的字串，還需要整個主體由符合的子字串組成，也就是主體的每個字元都是符合項的一部分。現有的 preg_* 函式都不容易適用於此任務，因此我建立了 preg_match_entire() 函式。
它使用了 (*MARK) 語法，相關文件在此：https://pcre.org/original/doc/html/pcrepattern.html#SEC27

<?php 

// 返回值：符合條件的匹配陣列
// 若字串不符合模式的重複，則返回 null
// 發生錯誤則返回 false
function preg_match_entire( string $pattern, string $subject, int $flags = 0 ){
 // 重新建構並包裝模式
 $delimiter = $pattern[0];
 $ldp = strrpos( $pattern, $delimiter );
 $pattern = substr( $pattern, 1, $ldp - 1 );
 $modifiers = substr( $pattern, $ldp + 1 );
 $pattern = "{$delimiter} \G\z (*MARK:END) | \G (?:{$pattern}) {$delimiter}x{$modifiers}";
 $r = preg_match_all( $pattern, $subject, $m, PREG_SET_ORDER | $flags );
 if( $r === false ) return false; // 錯誤
 $end = array_pop( $m );
 if( $end === null || ! isset( $end['MARK']) || $end['MARK'] !== 'END')
 return null; // 未到達字串結尾
 return $m; // 返回實際匹配項，可能為空陣列
}

// 相同的結果：
test('#{\d+}#', ''); // []
test('#{\d+}#', '{11}{22}{33}'); // {11},{22},{33}

// 不同的結果：preg_match_entire 不會匹配這個：
test('#{\d+}#', '{11}{}{aa}{22},{{33}}');
// preg_match_entire: null
// preg_match_all: {11},{22},{33}

function test( $pattern, $subject ){
 echo "pattern: $pattern\n";
 echo "subject: $subject\n";
 print_matches('preg_match_entire: ', preg_match_entire( $pattern, $subject ));
 preg_match_all( $pattern, $subject, $matches, PREG_SET_ORDER );
 print_matches('preg_match_all: ', $matches );
 echo "\n";
}
function print_matches( $t, $m ){
 echo $t, is_array( $m ) && $m ? implode(',', array_column( $m, 0 )) : json_encode( $m ), "\n";
} ?>

向上

向下

rajudec at gmail dot com ¶

2 年前

<?php
// 允許在 html 文字中有限度的 span 格式化

$str='<span style="text-decoration-line: underline; font-weight: bold; font-style: italic;">White</span>
<span style="text-decoration-line: underline;">RED</span><span style="color:blue">blue</span>';

function next_format($str)
{
 $array=array("text-decoration-line"=>"underline","font-weight"=>"bold","font-style"=>"italic");
 foreach ($array as $key=>$val)
 {
 if($str[1]==$key && $str[2]==$val)
 {
 return $str[1].': '.$str[2].";";
 }
 }
 return '';
 
}
function next_span($matches)
{
 $needFormat=preg_replace_callback('/([a-z\-]+):\s*([^;]+)(;|)/ism',"next_format",$matches[2]);
 return $matches[1].$needFormat.$matches[3];
 
}
 echo preg_replace_callback(
 "/(\<span\s+style\=\")([^\"]+)(\">)/ism",
 "next_span",
 $str);
?>

向上

向下

mojo ¶

3 年前

為什麼 <?php preg_match_all('/(?:^|\s)(ABC|XYZ)(?:\s|$)/i', 'ABC XYZ', $match) ?> 只找到 'ABC'？

因為第一個完整匹配是 'ABC ' - 包含尾隨空格。而該空格無法用於進一步處理。

使用後向斷言和前向斷言來解決這個問題：<?php preg_match_all('/(?<=^|\s)(ABC|XYZ)(?=\s|$)/i', 'ABC XYZ', $match) ?>

向上

向下

chris at ocproducts dot com ¶

3 年前

如果設定了 PREG_OFFSET_CAPTURE，則未匹配的捕獲（即帶有 '?' 的捕獲）將不會出現在結果陣列中。這可能是因為沒有偏移量，因此原始的 PHP 開發人員決定最好將其省略。

向上

向下

qdinar at gmail dot com ¶

6 年前

當正則表達式用於字串的較長和較短版本時，
只會捕獲該長版本和短版本中的一個。
當正則表達式在字串的一個位置發生匹配時，
在該位置只會將一個匹配項儲存在 matches[0] 中。
如果使用 ?，則正則表達式是貪婪的，並且會捕獲較長的版本，
如果使用 |，則會捕獲最先匹配的變體
<?php
preg_match_all('/ab|abc/','abc',$m);
var_dump($m);
preg_match_all('/abc?/','abc',$m);
var_dump($m);
?>
對於兩者，預期 $m[0] 中會有 ['ab', 'abc']，但事實並非如此，
實際上它們輸出 [['ab']] 和 [['abc']]
array(1) {
  [0]=>
array(1) {
    [0]=>
string(2) "ab"
  }
}
array(1) {
  [0]=>
array(1) {
    [0]=>
string(3) "abc"
  }
}

向上

向下

-1

b3forgames at gmail dot com ¶

1 年前

範例
$file = file_get_contents('file');
if(preg_match_all('#Task To Run(.*)#s', $file, $m)) {
var_dump($m);
}

沒有輸出...

如果檔案存在 BOM 位元組 (FF FE)，則 preg_match_all 無法運作

╰─$ head -n1 file | hexdump -C
00000000 ff fe 48 00 6f 00 73 00 74 00 4e 00 61 00 6d 00 |..H.o.s.t.N.a.m.|

透過 dos2unix 清除 BOM

╰─$ dos2unix file
dos2unix：正在將 UTF-16LE 檔案 file 轉換為 UTF-8 Unix 格式...

再次檢查

╰─$ head -n1 file | hexdump -C
00000000 48 6f 73 74 4e 61 6d 65 3a 20 20 20 20 20 20 20 |HostName: |

太棒了！現在 preg_match_all 運作正常了。

向上

向下

elyknosrac at gmail dot com ¶

15 年前

我使用 preg_match_all 寫了一個相當方便的函數。


<?php 
 
function reg_smart_replace($pattern, $replacement, $subject, $replacementChar = "$$$", $limit = -1) 
{ 
 if (! $pattern || ! $subject || ! $replacement ) { return false; } 
 
 $replacementChar = preg_quote($replacementChar); 
 
 preg_match_all ( $pattern, $subject, $matches); 
 
 if ($limit > -1) { 
 foreach ($matches as $count => $value ) 
 { 
 if ($count + 1 > $limit ) { unset($matches[$count]); } 
 } 
 } 
 foreach ($matches[0] as $match) { 
 $rep = ereg_replace($replacementChar, $match, $replacement); 
 $subject = ereg_replace($match, $rep, $subject); 
 } 
 
 return $subject; 
} 
?> 

這個函數可以將文字區塊轉換為可點擊的連結或其他任何東西。範例：


<?php 
reg_smart_replace(EMAIL_REGEX, '<a href="mailto:$$$">$$$</a>', $description) 
?> 
這會將所有電子郵件地址轉換為實際連結。


只需將 $$$ 替換為正規表示式找到的文字即可。 如果您不能使用 $$$，則使用第 4 個參數 $replacementChar。

向上

向下

MonkeyMan ¶

16 年前

這裡有一種方法可以比對頁面上的所有內容，並在比對的同時對每個比對執行動作。我曾在其他語言中使用過這種慣用法，但在 PHP 中似乎不太常見。

<?php
function custom_preg_match_all($pattern, $subject)
{
 $offset = 0;
 $match_count = 0;
 while(preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, $offset))
 {
 // 遞增計數器
 $match_count++;
 
 // 取得位元組偏移量和位元組長度 (假設為單一位元組編碼)
 $match_start = $matches[0][1];
 $match_length = strlen(matches[0][0]);

 // (選用) 將 $matches 轉換為通常設定的格式 (不設定 PREG_OFFSET_CAPTURE)
 foreach($matches as $k => $match) $newmatches[$k] = $match[0];
 $matches = $new_matches;
 
 // 您的程式碼在這裡
 echo "比對次數 $match_count，位元組偏移量 $match_start，長度 $match_length 個位元組: ".$matches[0]."\r\n";
 
 // 將偏移量更新為比對的結尾
 $offset = $match_start + $match_length;
 }

 return $match_count;
}
?>

請注意，傳回的偏移量是位元組值（不一定是字元數），因此您必須確保資料為單一位元組編碼。（或查看 paolo mosna 在 strlen 手冊頁上的 strByte 函數）。
我很想知道這種方法與使用 preg_match_all 然後遞迴處理結果相比，在速度方面的效能如何。

向上

向下

-1

matt at lvl99 dot com ¶

9 年前

我之前在線上使用 Regex101 工具和 `preg_match_all()` 測試工具來製作和測試一些正規表示式模式，發現我寫的正規表示式模式在它們上面運作良好，只是在我的程式碼中無法運作。

我的問題不是雙重跳脫反斜線字元。

<?php
// 輸入測試
$input = "\"something\",\"something here\",\"some\nnew\nlines\",\"this is the end\"";

// 可在線上正規表示式測試工具中使用，但在 PHP 中無法運作
preg_match_all( "/(?:,|^)(?<!\\)\".*?(?<!\\)\"(?:(?=,)|$)/s", $input, $matches );

/*
輸出：NULL
*/

// 可在線上正規表示式測試工具中使用，且在 PHP 中運作
preg_match_all( "/(?:,|^)(?<!\\\\)\".*?(?<!\\\\)\"(?:(?=,)|$)/s", $input, $matches );

/*
輸出：
array(2) {
 [0]=>
 array(4) {
 [0]=>
 string(11) ""something""
 [1]=>
 string(17) ","something here""
 [2]=>
 string(17) ","some
new
lines""
 [3]=>
 string(18) ","this is the end""
 }
 [1]=>
 array(4) {
 [0]=>
 string(9) "something"
 [1]=>
 string(14) "something here"
 [2]=>
 string(14) "some
new
lines"
 [3]=>
 string(15) "this is the end"
 }
}
*/
?>

向上

向下

-1

phektus at gmail dot com ¶

17 年前

如果您想在搭配 preg_match_all 使用的正規表示式中包含雙引號，請嘗試三次跳脫，例如：\\\"

例如，模式
'/<table>[\s\w\/<>=\\\"]*<\/table>/'

應該能夠比對
<table>
<row>
<col align="left" valign="top">a</col>
<col align="right" valign="bottom">b</col>
</row>
</table>
.. 包含這些表格標籤下的所有內容。

我不太確定為什麼會這樣，但我只嘗試了雙引號和一個或兩個跳脫字元，它都無法運作。在我的挫敗感中，我新增了另一個，然後就可以了。

向上

向下

-1

royaltm75 at gmail dot com ¶

15 年前

我收到抱怨，說我的 html2a() 程式碼（見下方）在某些情況下無法運作。
然而，這並不是演算法或程序的錯誤，而是 PCRE 遞迴堆疊限制的問題。

如果您使用遞迴 PCRE (?R)，您應該記得增加這兩個 ini 設定

ini_set('pcre.backtrack_limit', 10000000);
ini_set('pcre.recursion_limit', 10000000);

但請注意：（來自 php.ini）

;請注意，如果您將此值設定為高數值，您可能會耗盡所有
;可用的程序堆疊，並最終使 PHP 崩潰 (因為達到
;作業系統強加的堆疊大小限制)。

我寫這個範例主要是為了展示 PCRE 語言的功能，而不是它的實作功能 :)

但是如果您喜歡，可以使用它，當然風險自負。

向上

向下

-1

fseverin at free dot fr ¶

12 年前

當我打算為自己的目的創建一個乾淨的 PHP 類別來處理 XML 檔案時，結合使用 DOM 和 simplexml 函數，我遇到了那個小問題，但非常惱人，那就是路徑中的偏移量在這兩者中編號不同。


也就是說，例如，如果我取得 DOM xpath 物件，它會顯示為
/ANODE/ANOTHERNODE/SOMENODE[9]/NODE[2]
而 simplexml 物件會等同於
ANODE->ANOTHERNODE->SOMENODE[8]->NODE[1]


所以您知道我的意思嗎？我使用 preg_match_all 來解決這個問題，最後我在一些閉門思考後得到這個（因為我是法國人，所以變數名稱是法語，抱歉），希望它對你們中的一些人有用


<?php 
function decrease_string($string) 
 { 
 /* 擷取原始字串中所有數字的出現位置和偏移量： */ 
 
 preg_match_all("/[0-9]+/",$chaine,$out,PREG_OFFSET_CAPTURE); 
 for($i=0;$i<sizeof($out[0]);$i++) 
 { 
 $longueurnombre = strlen((string)$out[0][$i][0]); 
 $taillechaine = strlen($chaine); 
 // 將字串切割成 3 個部分 
 $debut = substr($chaine,0,$out[0][$i][1]); 
 $milieu = ($out[0][$i][0])-1; 
 $fin = substr($chaine,$out[0][$i][1]+$longueurnombre,$taillechaine); 
 /* 如果是 10, 100, 1000，問題在於字串會變短，並且會偏移所有偏移量，因此我們必須將它們減 1 */ 
 if(preg_match('#[1][0]+$#', $out[0][$i][0])) 
 { 
 for($j = $i+1;$j<sizeof($out[0]);$j++) 
 { 
 $out[0][$j][1] = $out[0][$j][1] -1; 
 } 
 } 
 $chaine = $debut.$milieu.$fin; 
 } 
 return $chaine; 
 } 
?>

向上

向下

-1

dolbegraeb ¶

16 年前

請注意，「mail at SPAMBUSTER at milianw dot de」這個函數在某些情況下可能會導致無效的 xhtml。我認為我使用的方式正確，但我的結果卻像是這樣

<img src="./img.jpg" alt="nice picture" />foo foo foo foo </img>

如果我錯了請指正。
我會找時間修復它。-.-

向上

向下

-3

ajeet dot nigam at icfaitechweb dot com ¶

10 年前

這裡 http://tryphpregex.com/ 是一個基於 php 的線上 regex 編輯器，它可以幫助您透過資料輸入時的即時 regex 比對醒目提示來測試您的正規表示式。

向上

向下

-1

DarkSide ¶

10 年前

這對於組合比對結果非常有用
$a = array_combine($matches[1], $matches[2]);

向上

向下

-3

satyavvd at ymail dot com ¶

13 年前

從 csv 字串中擷取欄位：(由於在 php5.3 之前您無法使用 str_getcsv 函數)
這是 regex


<?php 
 
$csvData = <<<EOF 
10,'20',"30","'40","'50'","\"60","70,80","09\\/18,/\"2011",'a,sdfcd' 
EOF 
 
$reg = <<<EOF 
/ 
 ( 
 ( 
 ([\'\"]) 
 ( 
 ( 
 [^\'\"] 
 | 
 (\\\\.) 
 )* 
 ) 
 (\\3) 
 | 
 ( 
 [^,] 
 | 
 (\\\\.) 
 )* 
 ),) 
 /x 
EOF; 
 
preg_match_all($reg,$csvData,$matches); 
 
// 擷取 csv 欄位 
print_r($matches[2]); 
?>

向上

向下

-2

mr davin ¶

17 年前

<?php
// 返回一個字串陣列，其中包含找到的開始和結束位置
 function findinside($start, $end, $string) {
 preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);
 return $m[1];
 }
 
 $start = "mary has";
 $end = "lambs.";
 $string = "mary has 6 lambs. phil has 13 lambs. mary stole phil's lambs. now mary has all the lambs.";

 $out = findinside($start, $end, $string);

 print_r ($out);

/* 結果如下 
(
 [0] => 6 
 [1] => all the 
)
*/ 
?>

向上

向下

-2

royaltm75 at NOSPAM dot gmail dot com ¶

15 年前

pregs 的威力僅受您的*想像力*所限制 :)
我使用 preg 遞迴比對 (?R) 撰寫了這個 html2a() 函數，它可以提供相當安全且萬無一失的 html/xml 擷取功能
<?php
function html2a ( $html ) {
 if ( !preg_match_all( '
@
\<\s*?(\w+)((?:\b(?:\'[^\']*\'|"[^"]*"|[^\>])*)?)\>
((?:(?>[^\<]*)|(?R))*)
\<\/\s*?\\1(?:\b[^\>]*)?\>
|\<\s*(\w+)(\b(?:\'[^\']*\'|"[^"]*"|[^\>])*)?\/?\>
@uxis', $html = trim($html), $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER) )
 return $html;
 $i = 0;
 $ret = array();
 foreach ($m as $set) {
 if ( strlen( $val = trim( substr($html, $i, $set[0][1] - $i) ) ) )
 $ret[] = $val;
 $val = $set[1][1] < 0 
 ? array( 'tag' => strtolower($set[4][0]) )
 : array( 'tag' => strtolower($set[1][0]), 'val' => html2a($set[3][0]) );
 if ( preg_match_all( '
/(\w+)\s*(?:=\s*(?:"([^"]*)"|\'([^\']*)\'|(\w+)))?/usix
', isset($set[5]) && $set[2][1] < 0
 ? $set[5][0]
 : $set[2][0]
 ,$attrs, PREG_SET_ORDER ) ) {
 foreach ($attrs as $a) {
 $val['attr'][$a[1]]=$a[count($a)-1];
 }
 }
 $ret[] = $val;
 $i = $set[0][1]+strlen( $set[0][0] );
 }
 $l = strlen($html);
 if ( $i < $l )
 if ( strlen( $val = trim( substr( $html, $i, $l - $i ) ) ) )
 $ret[] = $val;
 return $ret;
}
?>

現在讓我們用這個範例來試試看：（有一些非常糟糕的 xhtml 相容性錯誤，但是...我們不應該擔心）

<?php
$html = <<<EOT
some leftover text...
 < DIV class=noCompliant style = "text-align:left;" >
... and some other ...
< dIv > < empty> </ empty>
 <p> This is yet another text <br >
 that wasn't <b>compliant</b> too... <br />
 </p>
 <div class="noClass" > this one is better but we don't care anyway </div ><P>
 <input type= "text" name ='my "name' value = "nothin really." readonly>
end of paragraph </p> </Div> </div> some trailing text 
EOT;

$a = html2a($html);
//現在我們將用它來製作一些整潔的 html
echo a2html($a);

function a2html ( $a, $in = "" ) {
 if ( is_array($a) ) {
 $s = "";
 foreach ($a as $t)
 if ( is_array($t) ) {
 $attrs=""; 
 if ( isset($t['attr']) )
 foreach( $t['attr'] as $k => $v )
 $attrs.=" ${k}=".( strpos( $v, '"' )!==false ? "'$v'" : "\"$v\"" );
 $s.= $in."<".$t['tag'].$attrs.( isset( $t['val'] ) ? ">\n".a2html( $t['val'], $in." " ).$in."</".$t['tag'] : "/" ).">\n";
 } else
 $s.= $in.$t."\n";
 } else {
 $s = empty($a) ? "" : $in.$a."\n";
 }
 return $s;
}
?>
這會產生
some leftover text...
<div class="noCompliant" style="text-align:left;">
... and some other ...
<div>
<empty>
</empty>
<p>
This is yet another text
<br/>
that wasn't
<b>
compliant
</b>
too...
<br/>
</p>
<div class="noClass">
this one is better but we don't care anyway
</div>
<p>
<input type="text" name='my "name' value="nothin really." readonly="readonly"/>
end of paragraph
</p>
</div>
</div>
some trailing text

向上

向下

-3

avengis at gmail dot com ¶

15 年前

下一個函式幾乎可以處理任何複雜的 xml/xhtml 字串


<?php 
/** 
* 尋找並關閉未閉合的 XML 標籤 
**/ 
function close_tags($text) { 
 $patt_open = "%((?<!</)(?<=<)[\s]*[^/!>\s]+(?=>|[\s]+[^>]*[^/]>)(?!/>))%"; 
 $patt_close = "%((?<=</)([^>]+)(?=>))%"; 
 if (preg_match_all($patt_open,$text,$matches)) 
 { 
 $m_open = $matches[1]; 
 if(!empty($m_open)) 
 { 
 preg_match_all($patt_close,$text,$matches2); 
 $m_close = $matches2[1]; 
 if (count($m_open) > count($m_close)) 
 { 
 $m_open = array_reverse($m_open); 
 foreach ($m_close as $tag) $c_tags[$tag]++; 
 foreach ($m_open as $k => $tag) if ($c_tags[$tag]--<=0) $text.='</'.$tag.'>'; 
 } 
 } 
 } 
 return $text; 
} 
?>

向上

向下

-3

vojjov dot artem at ya dot ru ¶

9 年前

// 這是一個可以讓你使用 preg_match_all 來比對多個模式的函式

function getMatches($pattern, $subject) {
$matches = array();

if (is_array($pattern)) {
foreach ($pattern as $p) {
$m = getMatches($p, $subject);

foreach ($m as $key => $match) {
if (isset($matches[$key])) {
$matches[$key] = array_merge($matches[$key], $m[$key]);
} else {
$matches[$key] = $m[$key];
                }
            }
        }
} else {
preg_match_all($pattern, $subject, $matches);
    }

return $matches;
}

$patterns = array(
'/<span>(.*?)<\/span>/',
'/<a href=".*?">(.*?)<\/a>/'
);

$html = '<span>some text</span>';
$html .= '<span>some text in another span</span>';
$html .= '<a href="path/">here is the link</a>';
$html .= '<address>address is here</address>';
$html .= '<span>here is one more span</span>';

$matches = getMatches($patterns, $html);

print_r($matches); // 結果如下

/*
陣列
(
[0] => 陣列
        (
[0] => <span>some text</span>
[1] => <span>some text in another span</span>
[2] => <span>here is one more span</span>
[3] => <a href="path/">here is the link</a>
        )

[1] => 陣列
        (
[0] => some text
[1] => some text in another span
[2] => here is one more span
[3] => here is the link
        )

)
*/

＋新增筆記