soundex

（PHP 4、PHP 5、PHP 7、PHP 8）

soundex — 計算字串的 soundex 鍵

描述

soundex(string $string): string

計算 string 的 soundex 鍵。

Soundex 鍵具有一個特性，即發音相似的單字會產生相同的 soundex 鍵，因此可用於簡化資料庫中的搜尋，在您知道發音但不知道拼寫時。

此特定的 soundex 函數是 Donald Knuth 在「The Art Of Computer Programming, vol. 3: Sorting And Searching」, Addison-Wesley (1973), pp. 391-392 中描述的其中一個。

參數

string: 輸入字串。

回傳值

以具有四個字元的 string 傳回 soundex 鍵。如果 string 中至少包含一個字母，則傳回的字串會以字母開頭。否則會傳回 "0000"。

變更日誌

版本	描述
8.0.0	在此版本之前，使用空字串呼叫函數會因為不明原因而傳回 `false`。

範例

範例 #1 Soundex 範例

<?php
soundex("Euler") == soundex("Ellery"); // E460
soundex("Gauss") == soundex("Ghosh"); // G200
soundex("Hilbert") == soundex("Heilbronn"); // H416
soundex("Knuth") == soundex("Kant"); // K530
soundex("Lloyd") == soundex("Ladd"); // L300
soundex("Lukasiewicz") == soundex("Lissajous"); // L222
?>

另請參閱

levenshtein() - 計算兩個字串之間的 Levenshtein 距離
metaphone() - 計算字串的 metaphone 鍵
similar_text() - 計算兩個字串之間的相似度

發現問題了嗎？

瞭解如何改善此頁面 • 提交提取請求 • 回報錯誤

＋新增註解

使用者貢獻的註解 20 則註解

上

下

nicolas dot zimmer at einfachmarke dot de ¶

16 年前

由於 soundex() 對於德語沒有產生最佳結果
我們編寫了一個函數來實作所謂的科隆語音法
（Cologne Phonetic）。

請在下方找到程式碼，希望能對您有所幫助

<?php
/**
 * 一個用於檢索字串的科隆語音值 (Kölner Phonetik value) 的函數
 * 
 * 如 http://de.wikipedia.org/wiki/Kölner_Phonetik 所述
 * 基於 Hans Joachim Postel: Die Kölner Phonetik. 
 * Ein Verfahren zur Identifizierung von Personennamen auf der 
 * Grundlage der Gestaltanalyse. 
 * in: IBM-Nachrichten, 19. Jahrgang, 1969, S. 925-931
 * 
 * 這個程式的發佈是希望它能有用，
 * 但不提供任何擔保；甚至不提供任何關於
 * 適銷性或適用於特定用途的默示擔保。請參閱
 * GNU通用公共許可證以了解更多詳細資訊。
 *
 * @package phonetics
 * @version 1.0
 * @link http://www.einfachmarke.de
 * @license GPL 3.0 <https://gnu.dev.org.tw/licenses/>
 * @copyright 2008 by einfachmarke.de
 * @author Nicolas Zimmer <nicolas dot zimmer at einfachmarke.de>
 */

function cologne_phon($word){
 
 /**
 * @param string $word 要分析的字串
 * @return string $value 代表科隆語音值
 * @access public
 */
 
 //準備進行處理
 $word=strtolower($word);
 $substitution=array(
 "ä"=>"a",
 "ö"=>"o",
 "ü"=>"u",
 "ß"=>"ss",
 "ph"=>"f"
 );

 foreach ($substitution as $letter=>$substitution) {
 $word=str_replace($letter,$substitution,$word);
 }
 
 $len=strlen($word);
 
 //例外規則
 $exceptionsLeading=array(
 4=>array("ca","ch","ck","cl","co","cq","cu","cx"),
 8=>array("dc","ds","dz","tc","ts","tz")
 );
 
 $exceptionsFollowing=array("sc","zc","cx","kx","qx");
 
 //編碼表
 $codingTable=array(
 0=>array("a","e","i","j","o","u","y"),
 1=>array("b","p"),
 2=>array("d","t"),
 3=>array("f","v","w"),
 4=>array("c","g","k","q"),
 48=>array("x"),
 5=>array("l"),
 6=>array("m","n"),
 7=>array("r"),
 8=>array("c","s","z"),
 );
 
 for ($i=0;$i<$len;$i++){
 $value[$i]="";
 
 //例外情況
 if ($i==0 AND $word[$i].$word[$i+1]=="cr") $value[$i]=4;
 
 foreach ($exceptionsLeading as $code=>$letters) {
 if (in_array($word[$i].$word[$i+1],$letters)){

 $value[$i]=$code;

} }
 
 if ($i!=0 AND (in_array($word[$i-1].$word[$i], 
$exceptionsFollowing))) {

 value[$i]=8; 

} 
 
 //正常編碼
 if ($value[$i]==""){
 foreach ($codingTable as $code=>$letters) {
 if (in_array($word[$i],$letters))$value[$i]=$code;
 }
 }
 }
 
 //刪除重複的值
 $len=count($value);
 
 for ($i=1;$i<$len;$i++){
 if ($value[$i]==$value[$i-1]) $value[$i]="";
 }
 
 //刪除母音
 for ($i=1;$i>$len;$i++){//省略第一個字元代碼和 h
 if ($value[$i]==0) $value[$i]="";
 }
 
 
 $value=array_filter($value);
 $value=implode("",$value);
 
 return $value;
 
}

?>

上

下

fie at myrealbox dot com ¶

21 年前

zinious dot com 的管理員

抱歉，但您的程式碼不符合 soundex 標準
以下是我的程式碼、您的程式碼和預設程式碼的結果

字串：rest
R620 執行管理員的功能 0.009452
R230 執行 cg 的功能 0.001779
R230 執行預設 soundex 功能 9.4999999999956E-005

字串：reset
R620 執行管理員的功能 0.0055900000000001
R230 執行 cg 的功能 0.00091799999999997
R230 執行預設 soundex 功能 0.00010600000000005

我不知道為什麼預設值，偶爾會出現 9.xxx，我覺得很奇怪..
我的程式碼在最下方.. 這些測試是在我如下所述的 soundex 修改之前進行的..
順帶一提，關於 soundex 演算法的所有原始規格，請前往
http://www.star-shine.net/~functionifelse/GFD/?word=soundex

dalibor dot toth at podravka dot hr

是的，它給您相同的代碼可能有點令人難過，
即使是 metaphone 也有這個問題..
但有人可能不希望如此精確.. 如果有人
在搜尋引擎上.. 我們稱它為 shmoogle 搜尋
「php array reset」和搜尋「php array rest」
那麼 shmoogle 可能會返回有關床之類的東西..
（如果他們都很笨，沒有使用第一個字詞
作為更重要的）所以無論如何，shmoogle 可能需要它來
在這種情況下準確度會降低，但儘管如此...
我的解決方案是在字串末尾加上音節數，使其長度為 5 個字元。
這會如下運作...

程式碼位於：http://star-shine.net/~functionifelse/cg_soundex.php

或者如果你只想使用預設的 soundex 函式

$str = soundex($str).cg_sylc($str);

或多或少具有革命性，可能更少...
這個函式僅適用於單字，我希望能看到有人
修改它以使用 split 並在迴圈中執行，以取得每個單字的 cg_soundex
那會很有趣 ;)
我也想建議 php zend apache 等開發 php 的人員
加入一個可選的額外變數，讓使用者可以指定如下：

soundex("字串",SYL);

這會在字串末尾返回音節數
高精度的聲音測試，太棒了！你也可以加入 VOW 代表母音
和 CONS 代表子音，或其他任何你想要的...
但我真的認為音節數會非常有效率。
嗯...如果這對任何人有幫助，不客氣...嗯...祝你們一切順利
你們的 php 冒險...喔...還有最終結果

音節
1 rest
2 reset
metaphone
RST rest
RST reset
soundex
R230 rest
R230 reset

字串：rest
R2301 執行 cg 的函式 0.00211
R230 執行預設 soundex 函式 0.00011299999999997

字串：reset
R2302 執行 cg 的函式 0.001691
R230 執行預設 soundex 函式 0.00010399999999999

預設函式速度稍微快一點...
所以也許他們會加入這個選項，我們將同時擁有速度和準確性。

寂靜的毀滅之風！咻！

上

下

Dirk Hoeschen - Feenders de ¶

10 年前

我對 niclas zimmer 的「科隆語音」函式做了一些改進。陣列的鍵和值被反轉，改為使用簡單陣列而不是多維陣列。因此，不再需要所有的迴圈和迭代來尋找字元的匹配值。
我將該函式放入一個靜態類別，並將陣列宣告移到函式之外。

結果是比原始版本更可靠，速度快五倍。

<?php 
class CologneHash() {

 static $eLeading = array("ca" => 4, "ch" => 4, "ck" => 4, "cl" => 4, "co" => 4, "cq" => 4, "cu" => 4, "cx" => 4, "dc" => 8, "ds" => 8, "dz" => 8, "tc" => 8, "ts" => 8, "tz" => 8); 

 static $eFollow = array("sc", "zc", "cx", "kx", "qx");

 static $codingTable = array("a" => 0, "e" => 0, "i" => 0, "j" => 0, "o" => 0, "u" => 0, "y" => 0,
 "b" => 1, "p" => 1, "d" => 2, "t" => 2, "f" => 3, "v" => 3, "w" => 3, "c" => 4, "g" => 4, "k" => 4, "q" => 4,
 "x" => 48, "l" => 5, "m" => 6, "n" => 6, "r" => 7, "c" => 8, "s" => 8, "z" => 8);

 public static function getCologneHash($word)
 {
 if (empty($word)) return false;
 $len = strlen($word);
 
 for ($i = 0; $i < $len; $i++) {
 $value[$i] = "";
 
 //例外情況
 if ($i == 0 && $word[$i] . $word[$i + 1] == "cr") {
 $value[$i] = 4;
 }
 
 if (isset($word[$i + 1]) && isset(self::$eLeading[$word[$i] . $word[$i + 1]])) {
 $value[$i] = self::$eLeading[$word[$i] . $word[$i + 1]];
 }

 if ($i != 0 && (in_array($word[$i - 1] . $word[$i], self::$eFollow))) {
 $value[$i] = 8;
 }
 
 // 一般編碼
 if ($value[$i]=="") {
 if (isset(self::$codingTable[$word[$i]])) {
 $value[$i] = self::$codingTable[$word[$i]];
 }
 }
 }

 // 刪除重複的值
 $len = count($value);
 
 for ($i = 1; $i < $len; $i++) {
 if ($value[$i] == $value[$i - 1]) {
 $value[$i] = "";
 }
 }
 
 // 刪除母音
 for ($i = 1; $i > $len; $i++) {
 // 省略第一個字元代碼和 h
 if ($value[$i] == 0) {
 $value[$i] = "";
 }
 }
 
 $value = array_filter($value);
 $value = implode("", $value);
 
 return $value;
 }
 
}
?>

上

下

synnus at gmail dot com ¶

9 年前

<?php
// https://github.com/Fruneau/Fruneau.github.io/blob/master/assets/soundex_fr.php
// http://blog.mymind.fr/blog/2007/03/15/soundex-francais/
function soundex_fr($sIn){
 static $convVIn, $convVOut, $convGuIn, $convGuOut, $accents;
 if (!isset($convGuIn)) {
 $accents = array('É' => 'E', 'È' => 'E', 'Ë' => 'E', 'Ê' => 'E',
 'Á' => 'A', 'À' => 'A', 'Ä' => 'A', 'Â' => 'A', 'Å' => 'A', 'Ã' => 'A',
 'Ï' => 'I', 'Î' => 'I', 'Ì' => 'I', 'Í' => 'I',
 'Ô' => 'O', 'Ö' => 'O', 'Ò' => 'O', 'Ó' => 'O', 'Õ' => 'O', 'Ø' => 'O',
 'Ú' => 'U', 'Ù' => 'U', 'Û' => 'U', 'Ü' => 'U',
 'Ç' => 'C', 'Ñ' => 'N', 'Ç' => 'S', '¿' => 'E',
 'é' => 'e', 'è' => 'e', 'ë' => 'e', 'ê' => 'e',
 'á' => 'a', 'à' => 'a', 'ä' => 'a', 'â' => 'a', 'å' => 'a', 'ã' => 'a',
 'ï' => 'i', 'î' => 'i', 'ì' => 'i', 'í' => 'i',
 'ô' => 'o', 'ö' => 'o', 'ò' => 'o', 'ó' => 'o', 'õ' => 'o', 'ø' => 'o',
 'ú' => 'u', 'ù' => 'u', 'û' => 'u', 'ü' => 'u',
 'ç' => 'c', 'ñ' => 'n');
 $convGuIn = array( 'GUI', 'GUE', 'GA', 'GO', 'GU', 'SCI', 'SCE', 'SC', 'CA', 'CO',
 'CU', 'QU', 'Q', 'CC', 'CK', 'G', 'ST', 'PH');
 $convGuOut = array( 'KI', 'KE', 'KA', 'KO', 'K', 'SI', 'SE', 'SK', 'KA', 'KO',
 'KU', 'K', 'K', 'K', 'K', 'J', 'T', 'F');
 $convVIn = array( '/E?(AU)/', '/([EA])?[UI]([NM])([^EAIOUY]|$)/', '/[AE]O?[NM]([^AEIOUY]|$)/',
 '/[EA][IY]([NM]?[^NM]|$)/', '/(^|[^OEUIA])(OEU|OE|EU)([^OEUIA]|$)/', '/OI/',
 '/(ILLE?|I)/', '/O(U|W)/', '/O[NM]($|[^EAOUIY])/', '/(SC|S|C)H/',
 '/([^AEIOUY1])[^AEIOUYLKTPNR]([UAO])([^AEIOUY])/', '/([^AEIOUY]|^)([AUO])[^AEIOUYLKTP]([^AEIOUY1])/', '/^KN/',
 '/^PF/', '/C([^AEIOUY]|$)/', '/E(Z|R)$/',
 '/C/', '/Z$/', '/(?<!^)Z+/', '/H/', '/W/');
 $convVOut = array( 'O', '1\3', 'A\1',
 'E\1', '\1E\3', 'O',
 'Y', 'U', 'O\1', '9', 
 '\1\2\3', '\1\2\3', 'N',
 'F', 'K\1', 'E',
 'S', 'SE', 'S', '', 'V');
 }

 if ( $sIn === '' ) return ' ';
 $sIn = strtr( $sIn, $accents);
 $sIn = strtoupper( $sIn );
 $sIn = preg_replace( '`[^A-Z]`', '', $sIn );
 if ( strlen( $sIn ) === 1 ) return $sIn . ' ';
 $sIn = str_replace( $convGuIn, $convGuOut, $sIn );
 $sIn = preg_replace( '`(.)\1`', '$1', $sIn );
 $sIn = preg_replace( $convVIn, $convVOut, $sIn);
 $sIn = preg_replace( '`L?[TDX]?S?$`', '', $sIn );
 $sIn = preg_replace( '`(?!^)Y([^AEOU]|$)`', '\1', $sIn);
 $sIn = preg_replace( '`(?!^)[EA]`', '', $sIn);
 return substr( $sIn . ' ', 0, 4);
}
?>

上

下

cap at capsi dot cx ¶

24 年前

不幸的是，soundex() 對於第一個字元非常敏感。不可能使用它讓 Clansy 和 Klansy 返回相同的值。如果您想對這類名稱進行語音搜尋，您仍然需要編寫一個常式來評估 C452 與 K452 相似。

上

下

synnus at gmail dot com ¶

4 年前

<?php
/* SOUNDEX FRENCH 
Frederic Bouchery 26-Sep-2003
http://www.php-help.net/sources-php/a.french.adapted.soundex.289.html
*/

function soundex2( $sIn ) {
 // 如果沒有單字，立即返回
 if ( $sIn === '' ) return ' ';
 // 將所有字母轉為大寫
 $sIn = strtoupper( $sIn );
 // 移除重音符號
 $sIn = strtr( $sIn, 'ÂÄÀÇÈÉÊË&#338;ÎÏÔÖÙÛÜ', 'AAASEEEEEIIOOUUU' );
 // 移除所有非字母的字元
 $sIn = preg_replace( '`[^A-Z]`', '', $sIn );
 // 如果字串只有一個字元，直接返回加上一個空格。
 if ( strlen( $sIn ) === 1 ) return $sIn . ' ';
 // 替換主要輔音
 $convIn = array( 'GUI', 'GUE', 'GA', 'GO', 'GU', 'CA', 'CO', 'CU',
'Q', 'CC', 'CK' );
 $convOut = array( 'KI', 'KE', 'KA', 'KO', 'K', 'KA', 'KO', 'KU', 'K',
'K', 'K' );
 $sIn = str_replace( $convIn, $convOut, $sIn );
 // 將母音（除了 Y 和第一個字元）替換為 A
 $sIn = preg_replace( '`(?<!^)[EIOU]`', 'A', $sIn );
 // 替換前綴，保留第一個字母，並進行額外的替換
 $convIn = array( '`^KN`', '`^(PH|PF)`', '`^MAC`', '`^SCH`', '`^ASA`',
'`(?<!^)KN`', '`(?<!^)(PH|PF)`', '`(?<!^)MAC`', '`(?<!^)SCH`',
'`(?<!^)ASA`' );
 $convOut = array( 'NN', 'FF', 'MCC', 'SSS', 'AZA', 'NN', 'FF', 'MCC',
'SSS', 'AZA' );
 $sIn = preg_replace( $convIn, $convOut, $sIn );
 // 移除 H，除非是 CH 或 SH
 $sIn = preg_replace( '`(?<![CS])H`', '', $sIn );
 // 移除 Y，除非前面是 A
 $sIn = preg_replace( '`(?<!A)Y`', '', $sIn );
 // 移除結尾的 A, T, D, S
 $sIn = preg_replace( '`[ATDS]$`', '', $sIn );
 // 移除所有的 A，除了開頭的 A
 $sIn = preg_replace( '`(?!^)A`', '', $sIn );
 // 移除重複的字母
 $sIn = preg_replace( '`(.)\1`', '$1', $sIn );
 // 只保留 4 個字元，或用空白補齊
 return substr( $sIn . ' ', 0, 4);
}
?>

上

下

dcallaghan at linuxmail dot org ¶

22 年前

雖然標準的 soundex 字串長度為 4 個字元，而這也是 php 函式返回的值，但某些資料庫程式會返回任意長度的字串。例如，MySQL 就是如此。

MySQL 文件中有說明這一點，建議您可能希望使用 substring 來輸出標準的 4 個字元。我們以 'Dostoyevski' 作為範例。

select soundex("Dostoyevski")
返回 D2312
select substring(soundex("Dostoyevski"), 1, 4);
返回 D231

PHP 將會返回 'D231' 作為值

因此，要在 MySQL SELECT 語句中使用 soundex 函式產生 WHERE 參數，您可以嘗試這樣做
$s = soundex('Dostoyevski');
SELECT * FROM authors WHERE substring(soundex(lastname), 1 , 4) = "' . $s . '"';

或者，如果您想繞過 php 函式
$result = mysql_query("select soundex('Dostoyevski')");
$s = mysql_result($result, 0, 0);

上

下

administrator at zinious dot com ¶

22 年前

我很久以前在 CGI-perl 中寫了這個函式，然後翻譯（如果你能這麼稱呼它的話）成 PHP。 至少可以說有點笨拙，但應該可以 100% 處理真正的 soundex 規範

// ---程式碼開始---

function MakeSoundEx($stringtomakesoundexof)
{
$temp_Name = $stringtomakesoundexof;
$SoundKey1 = "BPFV";
$SoundKey2 = "CSKGJQXZ";
$SoundKey3 = "DT";
$SoundKey4 = "L";
$SoundKey5 = "MN";
$SoundKey6 = "R";
$SoundKey7 = "AEHIOUWY";

$temp_Name = strtoupper($temp_Name);
$temp_Last = "";
$temp_Soundex = substr($temp_Name, 0, 1);

$n = 1;
for ($i = 0; $i < strlen($SoundKey1); $i++)
    {
if ($temp_Soundex == substr($SoundKey1, i - 1, 1))
        {
$temp_Last = "1";
            }
    }
for ($i = 0; $i < strlen($SoundKey2); $i++)
    {
if ($temp_Soundex == substr($SoundKey2, i - 1, 1))
        {
$temp_Last = "2";
            }
    }
for ($i = 0; $i < strlen($SoundKey3); $i++)
    {
if ($temp_Soundex == substr($SoundKey3, i - 1, 1))
        {
$temp_Last = "3";
            }
    }
for ($i = 0; $i < strlen($SoundKey4); $i++)
    {
if ($temp_Soundex == substr($SoundKey4, i - 1, 1))
        {
$temp_Last = "4";
            }
    }
for ($i = 0; $i < strlen($SoundKey5); $i++)
    {
if ($temp_Soundex == substr($SoundKey5, i - 1, 1))
        {
$temp_Last = "5";
            }
    }
for ($i = 0; $i < strlen($SoundKey6); $i++)
    {
if ($temp_Soundex == substr($SoundKey6, i - 1, 1))
        {
$temp_Last = "6";
            }
    }
for ($i = 0; $i < strlen($SoundKey6); $i++)
    {
if ($temp_Soundex == substr($SoundKey6, i - 1, 1))
        {
$temp_Last = "";
            }
    }

for ($n = 1; $n < strlen($temp_Name); $n++)
    {
if (strlen($temp_Soundex) < 4)
        {
for ($i = 0; $i < strlen($SoundKey1); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey1, $i - 1, 1) && $temp_Last != "1")
                {
$temp_Soundex = $temp_Soundex."1";
$temp_Last = "1";
                }
            }
for ($i = 0; $i < strlen($SoundKey2); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey2, $i - 1, 1) && $temp_Last != "2")
                {
$temp_Soundex = $temp_Soundex."2";
$temp_Last = "2";
                }
            }
for ($i = 0; $i < strlen($SoundKey3); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey3, $i - 1, 1) && $temp_Last != "3")
                {
$temp_Soundex = $temp_Soundex."3";
$temp_Last = "3";
                }
            }
for ($i = 0; $i < strlen($SoundKey4); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey4, $i - 1, 1) && $temp_Last != "4")
                {
$temp_Soundex = $temp_Soundex."4";
$temp_Last = "4";
                }
            }
for ($i = 0; $i < strlen($SoundKey5); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey5, $i - 1, 1) && $temp_Last != "5")
                {
$temp_Soundex = $temp_Soundex."5";
$temp_Last = "5";
                }
            }
for ($i = 0; $i < strlen($SoundKey6); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey6, $i - 1, 1) && $temp_Last != "6")
                {
$temp_Soundex = $temp_Soundex."6";
$temp_Last = "6";
                }
            }
for ($i = 0; $i < strlen($SoundKey7); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey7, $i - 1, 1))
                {
$temp_Last = "";
                }
            }
        }
    }

while (strlen($temp_Soundex) < 4)
    {
$temp_Soundex = $temp_Soundex."0";
    }

return $temp_Soundex;
}

// ---程式碼結束---

上

下

witold4249 at rogers dot com ¶

22 年前

一個更簡單的方法來檢查單字之間的相似性，並避免 Klancy/Clancy 出現的問題，就是在字串前面簡單地加上任何字母

例如：OKlancy/OClancy

上

下

mail at gettheeawayspam dot iaindooley dot com ¶

21 年前

透過在 soundex 代碼上使用 levenshtein()，可以解決 soundex「前面不同的字母」問題。在我的應用程式中，該應用程式是搜尋專輯名稱的資料庫，以尋找與特定使用者提供的字串相符的條目，我執行以下操作

1. 搜尋資料庫中完全符合的名稱
2. 在資料庫中搜尋名稱以字串形式出現的條目。
3. 在資料庫中搜尋名稱中任何單字（如果使用者輸入了一個以上的單字）出現的條目，但排除小字（例如 and、the、of 等）。
4. 如果以上方法都失敗，則採用備案。

- 計算使用者搜尋詞彙與資料庫中每個條目之間的 Levenshtein 距離（levenshtein()），並以使用者輸入的搜尋詞彙長度百分比表示。

- 計算使用者輸入的搜尋詞彙的 Metaphone 碼與資料庫中每個欄位之間的 Levenshtein 距離，並以使用者輸入的搜尋詞彙的 Metaphone 碼長度百分比表示。

- 計算使用者輸入的搜尋詞彙的 Soundex 碼與資料庫中每個欄位之間的 Levenshtein 距離，並以原始使用者輸入的搜尋詞彙的 Soundex 碼長度百分比表示。

如果這些百分比中的任何一個小於 50（表示將接受首字母不同的兩個 Soundex 碼！），則該條目將被接受為可能的匹配項。

上

下

justin at NO dot blukrew dot SPAM dot com ¶

20 年前

我最初研究 soundex() 是因為我想比較個別字母的發音。因此，當發出一串產生的字元時，可以很容易地將它們彼此區分開來。（例如，TGDE 難以區分，而 RFQA 則更容易理解）。目標是產生 ID，這些 ID 在品質不一的無線電中也能以高度準確性輕鬆理解。我很快發現 soundex 和 metaphone 無法做到這一點（它們適用於單字），所以我編寫了以下程式碼來協助處理。ID 生成函數會迭代呼叫 chrSoundAlike()，將每個新字元與前面的字元進行比較。我很有興趣收到任何關於此的意見回饋。謝謝。

<?php
function chrSoundAlike($char1, $char2, $opts = FALSE) {
 $char1 = strtoupper($char1);
 $char2 = strtoupper($char2);
 $opts = strtoupper($opts);

 // 設定聽起來相似的字元集合。
 // （選項：包含數字、包含 W、包含兩者，或預設為不包含任何選項。）
 switch ($opts) {
 case 'NUMBERS':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;

 case 'STRICT':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;
 
 case 'BOTH':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;

 default:
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U'));
 break;
 }
 
 // 檢查 $char1 是否在集合中。
 $matchset = array();
 for ($i = 0; $i < count($sets); $i++) {
 if (in_array($char1, $sets[$i])) {
 $matchset = $sets[$i];
 }
 }

 // 如果 char2 與 char1 在同一集合中，或者如果 char1 和 char2 相同，則傳回 true。
 if (in_array($char2, $matchset) OR $char1 == $char2) {
 return TRUE;
 } else {
 return FALSE;
 }
}
?>

上

下

fie at myrealbox dot com ¶

21 年前

哎呀... 主機在該伺服器上被關閉了... 這是先前的程式碼

function cg_sylc($nos){
$nos = strtoupper($nos);
$syllables = 0;

$before = strlen($nos);
$nos = str_replace(array('AA','AE','AI','AO','AU',
'EA','EE','EI','EO','EU','IA','IE','II','IO',
'IU','OA','OE','OI','OO','OU','UA','UE',
'UI','UO','UU'), "", $nos);
$after = strlen($nos);
$diference = $before - $after;
if($before != $after) $syllables += $diference / 2;

if($nos[strlen($nos)-1] == "E") $syllables --;
if($nos[strlen($nos)-1] == "Y") $syllables ++;

$before = $after;
$nos = str_replace(array('A','E','I','O','U'),"",$nos);
$after = strlen($nos);
$syllables += ($before - $after);

return $syllables;
}

function cg_SoundEx($SExStr){
$syl = cg_sylc($SExStr);
$SExStr = strtoupper($SExStr);

for($i = 1, $ii = 2,print $SExStr[0]; ;$ii++){

if(($SExStr[$i] != $SExStr[$ii])){
$tsstr .= $SExStr[$ii];
$i ++;
      }
if($SExStr[$ii] == false){
break;
      }
    }

$tsstr = str_replace(array('A', 'E', 'H', 'I', 'O', 'U', 'W', 'Y'), "", $tsstr);
$tsstr = str_replace(array('B', 'F', 'P', 'V'), "1", $tsstr);
$tsstr = str_replace(array('C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z', '?'), "2", $tsstr);
$tsstr = str_replace(array('D', 'T'), "3", $tsstr);
$tsstr = str_replace(array('L'), "4", $tsstr);
$tsstr = str_replace(array('M', 'N', '?'), "5", $tsstr);
$tsstr = str_replace(array('R'), "6", $tsstr);

while($iii < 3){
if($tsstr[$iii] != false){
$ttsstr .= $tsstr[$iii];
} else {
$ttsstr .= "0";
    }
$iii ++;
  }
$ttsstr .= $syl;
print $ttsstr;
}

上

下

Anonymous ¶

22 年前

上述搜尋的更簡單方法是簡單地在字串前面新增任何字母，然後再進行比較。

例如：Klancy => LKlancy
Clancy => LClancy

上

下

Anonymous ¶

19 年前

由於輸出的語音表示法包含第一個字母，因此值得指出的是，如果您希望 soundex 索引在 klansy 和 clansy 聽起來不同的問題下仍能正常運作，請從第一個字母開始取子字串，因為第一個字母是單字的主要輔音，而數值則是單字的語音結構。

上

下

pee whitt at dental dot ufl dor edu ¶

21 年前

fie at myrealbox dot com-

關於您對 soudex 音節的要求 - 我認為計算單字中的母音群組將會產生準確的音節數。因此，不需要 soudex 功能，只需計算單字中的字元，每次從母音轉換到子音時，就增加音節數。

使用這個邏輯，這個句子會被分類如下。
2 1 2 1 1 (3) (0) (4) (0) 2

其中 (#) 標記的單字分類不正確。我相信只要稍微思考一下，就可以找出這些情況的邏輯，進而產生準確的計數。從母音到子音的計數變化將會產生 -
(1) 1 2 1 2 1 (4) 1 2

取兩種方法的平均值，然後將結果向上取整，可以修正大多數的錯誤。

上

下

crchafer-php at c2se dot com ¶

19 年前

可以重寫，也許 -- 但演算法有一些明顯的
可以進行最佳化的部分，例如...

function text__soundex( $text ) {
$k = ' 123 12 22455 12623 1 2 2';
$nl = strlen( $tN = strtoupper( $text ) );
$p = trim( $k{ ord( $tS = $tN{0} ) - 65 } );
for( $n = 1; $n < $nl; ++$n )
if( ( $l = trim( $k{ ord( $tN{ $n } ) - 65 } ) ) != $p )
$tS .= ( $p = $l );
return substr( $tS . '000', 0, 4 );
        }

// Notes
// $k 是 $key，本質上是 $SoundKey 的反向
// $tN 是要最佳化的文字的大寫形式
// $tS 是部分產生的輸出
// $l 是目前的字母，$p 是前一個字母
// $n 和 $nl 是迭代索引
// 65 是 ord('A')，為了速度預先計算
// 不支援非 ASCII 字母
// 注意括號，這裡相當混雜

(程式碼僅經過基本測試，但它似乎
與 PHP 的 soundex() 輸出相符，速度未經測試 --
儘管由於移除了大部分迴圈和比較，這應該比 a4_perfect 的
重寫快/很多/)

C
2005-09-13

上

下

Marc Quinton. ¶

19 年前

一個法語 soundex 版本；可以用於其他缺乏 soudex 的外語。或許可以編寫一個包含每種語言特性的類別。

http://www.php-help.net/sources-php/a.french.adapted.soundex.289.html

上

下

shortcut ¶

18 年前

對於 klancy 與 clancy 中 soundex 除了第一個字母外是否有效的問題，答案是要總是為單字加上相同的前綴字母。

aklancy 會匹配 aclancy
bklancy 會匹配 bclancy

soundex 似乎只檢查前 2 個音節？
例如：spectacular 會匹配 spectacle

如果您依賴 soundex，這只是一個想法。

k-

上

下

jr ¶

21 年前

MySQL/PHP 在 soundex 實作上的差異有一個解決方法，就是在 MySQL 內完整執行 soundex 比較。

例如
$sql = "SELECT * FROM table WHERE substring(soundex(field), 1, 4) = substring(soundex('".$wordsearch."'), 1, 4)";

上

下

-1

info at nederlandsch dot net ¶

21 年前

MySQL soundex (3.23.49) 根本不會檢查第一個字元是否應該跳過。因此，荷蘭政府所在地海牙的荷蘭名稱「's-Gravenhage」在 MySQL 中會產生 '261 的 soundex 值，而在 PHP 中則會產生 S615。

＋新增註解