mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)

mb_decode_numericentity — 將 HTML 數值字串參考解碼為字元

說明

mb_decode_numericentity(字串 $string, 陣列 $map, ?字串 $encoding = null): 字串

將指定區塊中字串 string 的數字字串參考轉換為字元。

參數

string: 要解碼的字串。
map: map 是一個陣列，指定要轉換的程式碼區域。
encoding: encoding 參數是字元編碼。如果省略或為 null，則會使用內部字元編碼值。
is_hex: 此參數未使用。

回傳值

轉換後的字串。

錯誤／例外

如果 map 不是整數列表，則會拋出 ValueError。

更新日誌

版本	說明
8.4.0	如果 `map` 不是整數列表，mb_decode_numericentity() 現在會拋出 ValueError。
8.0.0	`encoding` 現在可以為 null。

範例

範例 #1 map 範例

<?php
$convmap = array (
 int start_code1, int end_code1, int offset1, int mask1,
 int start_code2, int end_code2, int offset2, int mask2,
 ........
 int start_codeN, int end_codeN, int offsetN, int maskN );
// 指定 start_codeN 和 end_codeN 的 Unicode 值
// 將 offsetN 加到值上，並與 maskN 進行位元「AND」運算，
// 然後將值轉換為數字字串參考。
?>

範例 #2 map 範例跳脫 JavaScript 字串

<?php
function escape_javascript_string($str) {
 $map = [
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,0,0, // 49
 0,0,0,0,0,0,0,0,1,1,
 1,1,1,1,1,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,
 0,1,1,1,1,1,1,0,0,0, // 99
 0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,
 0,0,0,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1, // 149
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1, // 199
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1,
 1,1,1,1,1,1,1,1,1,1, // 249
 1,1,1,1,1,1,1, // 255
 ];
 // Char encoding is UTF-8
 $mblen = mb_strlen($str, 'UTF-8');
 $utf32 = bin2hex(mb_convert_encoding($str, 'UTF-32', 'UTF-8'));
 for ($i=0, $encoded=''; $i < $mblen; $i++) {
 $u = substr($utf32, $i*8, 8);
 $v = base_convert($u, 16, 10);
 if ($v < 256 && $map[$v]) {
 $encoded .= '\\x'.substr($u, 6,2);
 } else if ($v == 2028) {
 $encoded .= '\\u2028';
 } else if ($v == 2029) {
 $encoded .= '\\u2029';
 } else {
 $encoded .= mb_convert_encoding(hex2bin($u), 'UTF-8', 'UTF-32');
 }
 }
 return $encoded;
}
 
// Test data
$convmap = [ 0x0, 0xffff, 0, 0xffff ];
$msg = '';
for ($i=0; $i < 1000; $i++) {
 // chr() cannot generate correct UTF-8 data larger value than 128, use mb_decode_numericentity().
 $msg .= mb_decode_numericentity('&#'.$i.';', $convmap, 'UTF-8');
}
 
// var_dump($msg);
var_dump(escape_javascript_string($msg));

參見

mb_encode_numericentity() - 將字元編碼為 HTML 數字字串參考

發現問題了嗎？

學習如何改進此頁面 • 提交拉取請求 • 回報錯誤

＋新增筆記

使用者貢獻的筆記 4 則筆記

向上

向下

abderrahmanekaddour dot aissat at gmail dot com ¶

2 年前

<?php

// 以下說明文件基於對 php mbr 原始碼的理解
// 首先，為了優化 php 的工作效率
// 字串必須包含 "&"，否則 php 不會嘗試解碼。
// 對於映射：int start_codeN, int end_codeN, int offsetN, int maskN
// 實體必須在 [start_codeN, end_codeN] 範圍內，如果實體大於或小於此範圍
// mb_decode_numericentity 將忽略解碼過程並按原樣返回 $string。
// 在 php 的後期版本中，$map："必須包含 4 的倍數個元素"

$map = [ 0x0, 0xFFFF, 0, 0];
echo mb_decode_numericentity('&#109;', $map ); // 結果 "m"
// 如果 offsetN = 1，結果為 "l"；小數越大，OR 運算元的使用越多。
$map_2 = [ 0x0, 0xFFFF, 60, 0];
echo mb_decode_numericentity('&#109;', $map_2 ); // 解碼 ( &#49; ) 結果："1"

// 實體參考以檢查結果： https://cs.stanford.edu/people/miles/iso8859.html#ISO

?>

向上

向下

donovan at conduit it ¶

18 年前

請注意，目前看來 mb_decode_numericentity() 似乎只適用於十進制實體，而不適用於十六進制實體。這個事實可以為我節省一個小時的除錯時間。

對於需要轉換十六進制實體的人，請嘗試先使用 preg_replace() 和 hexdec() 函數的組合將它們全部轉換為十進制實體。

向上

向下

dev at glossword info ¶

21 年前

兩個很棒的日常使用函數

/* 將任何 HTML 實體轉換為字元 */
function my_numeric2character($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* 將任何字元轉換為 HTML 實體 */
function my_character2numeric($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ? ? ');

向上

向下

-2

fernandosilveira at yahoo dot com dot br ¶

4 年前

請小心！
除了將數字實體轉換為指定目標編碼的字元外，此函數還會將輸入字串中的每個字元編碼為指定的目標編碼，即使這些字元位於轉換映射定義的範圍之外。

＋新增筆記