PHP Conference Japan 2024

IntlCodePointBreakIterator 類別

(PHP 5 >= 5.5.0, PHP 7, PHP 8)

簡介

斷詞器識別 UTF-8 字碼點之間的邊界。

類別概要

class IntlCodePointBreakIterator extends IntlBreakIterator {
/* 繼承的常數 */
/* 方法 */
/* 繼承的方法 */
公開 IntlBreakIterator::getPartsIterator(string $type = IntlPartsIterator::KEY_SEQUENTIAL): IntlPartsIterator
}

目錄

新增註解

使用者貢獻的註解 1 則註解

Matt Kynx
2 年前
使用此方法查找字串中所有無法轉碼為 Latin-ASCII 的程式碼點的範例

<?php

$string
= "Народm, Intl gurus get paid €10000/hr 😁";

$latinAscii = Transliterator::create('NFC; Any-Latin; Latin-ASCII;');
$transliterated = $latinAscii->transliterate($string);

$codePoints = IntlBreakIterator::createCodePointInstance();
$codePoints->setText($transliterated);

foreach (
$codePoints->getPartsIterator() as $char) {
$ord = IntlChar::ord($char);
if (
255 < $ord) {
echo
IntlChar::charName($ord) . "\n";
}
}
?>

輸出
歐元符號
露齒而笑的表情符號
To Top