get_meta_tags

(PHP 4, PHP 5, PHP 7, PHP 8)

get_meta_tags — 從檔案中提取所有 meta 標籤的內容屬性，並返回一個陣列

描述

get_meta_tags(字串 $filename, 布林值 $use_include_path = false): 陣列|false

開啟 filename 並逐行解析檔案中的 <meta> 標籤。解析會在 </head> 處停止。

參數

filename

HTML 檔案的路徑，以字串表示。這可以是本機檔案或 URL。

範例 #1 get_meta_tags() 解析的內容

<meta name="author" content="name">
<meta name="keywords" content="php documentation">
<meta name="DESCRIPTION" content="a php manual">
<meta name="geo.position" content="49.33;-86.59">
</head> <!-- parsing stops here -->

use_include_path

將 use_include_path 設定為 true 將導致 PHP 嘗試根據 include_path 指令，沿著標準包含路徑開啟檔案。這用於本機檔案，而非 URL。

傳回值

傳回包含所有已解析 meta 標籤的陣列。

name 屬性的值會成為鍵，content 屬性的值會成為傳回陣列的值，因此您可以輕鬆使用標準陣列函式來遍歷它或存取單個值。name 屬性值中的特殊字元會被 '_' 取代，其餘字元會轉換為小寫。如果兩個 meta 標籤具有相同的名稱，則只會傳回最後一個。

失敗時傳回 false。

範例

範例 #2 get_meta_tags() 傳回的內容

<?php
// 假設以上標籤位於 www.example.com
$tags = get_meta_tags('http://www.example.com/');

// 注意鍵現在如何全部小寫，以及
// 如何在鍵中將 . 取代為 _。
echo $tags['author']; // name
echo $tags['keywords']; // php 文件
echo $tags['description']; // php 手冊
echo $tags['geo_position']; // 49.33;-86.59
?>

注意

注意:
只會解析具有 name 屬性的 meta 標籤。引號不是必需的。

參見

htmlentities() - 將所有適用的字元轉換為 HTML 實體
urlencode() - URL 編碼字串

發現問題？

了解如何改進此頁面 • 提交 Pull Request • 報告錯誤

＋新增註解

使用者貢獻的註解 19 則註解

向上

向下

bobble bubble ¶

9 年前

這個正則表達式透過在 lookahead 內捕獲，來取得獨立於順序的 meta 標籤。
進一步使用分支重置功能來處理不同的值引號樣式。
可以在這裡測試這個模式：https://regex101.com/r/oE4oU9/1

<?PHP

function getMetaTags($str)
{
 $pattern = '
 ~<\s*meta\s

 # 使用 lookahead 捕捉類型到 $1
 (?=[^>]*?
 \b(?:name|property|http-equiv)\s*=\s*
 (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|
 ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))
 )

 # 捕捉內容到 $2
 [^>]*?\bcontent\s*=\s*
 (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|
 ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))
 [^>]*>

 ~ix';
 
 if(preg_match_all($pattern, $str, $out))
 return array_combine($out[1], $out[2]);
 return array();
}

// 用法
$meta_tags = getMetaTags($str);

?>

向上

向下

jp at webgraphe dot com ¶

20 年前

如果 URL 使用標頭進行重新導向 (就像您使用 PHP 函式 header("Location: URL"); 一樣)，則頁面通常沒有內容。看起來 get_meta_tags() 沒有捕捉到那種重新導向 (就像 cURL 會做的那樣)，這導致我的腳本逾時。

我在我寫的爬蟲程式中遇到這個問題，為了將我網站上所有可用的頁面輸入我的資料庫，並且其中一個連結是連結到一個只有以下程式碼的頁面

<?php
 header("Location: sections.php?section=home");
 exit();
?>

這讓我的腳本暫停了一下，而且顯然 get_meta_tags() 甚至無法傳回給我一個錯誤。

JP。

向上

向下

Ebpo ¶

11 年前

請注意，該函式會在整個頁面中尋找 meta 標籤。如果出於某種原因在您的程式碼中註解了其中一個 meta，它仍然會被抓取。

向上

向下

richard dot dern at athaliasoft dot fr ¶

11 年前

我個人在使用 DOM 函式而非正規表達式時，遇到較少的問題，同時試圖提取 meta 標籤，並且不使用 get_meta_tags 函式 (以便也取得 http-equiv meta 標籤)。

<?php

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

$nodes = $xpath->query('//head/meta');

foreach($nodes as $node) {
 [...]
}

?>

向上

向下

匿名 ¶

22 年前

已測試 PHP 4.0.6


get_meta_tags() 似乎只在檔案的開頭尋找，也就是說，如果 HTML 標頭之前有很多 PHP 程式碼，它將不會傳回任何內容...
已使用 get_meta_tags() 在本機檔案上進行測試，在 HTML HEADER 之前有約 9000 個字元的 PHP 程式碼。


變通方法：如果可以，請將程式碼移到標頭之後，或者如果不行：包含一個檔案。

向上

向下

richard at pifmagazine dot com ¶

24 年前

關於 META 標籤和此函式的重要注意事項：如果您的 META 標籤包含換行符號「\n」，get_meta_tags() 將會傳回該 name 屬性的 NULL 值。從來源 META 標籤中移除換行符號可以修正這個問題。

向上

向下

rehfeld ¶

19 年前

回覆
jp at webgraphe dot com

此函式會抓取 meta 標籤，而不是 http 標頭

如果您需要標頭

<?php

$fp = fopen('http://example.org/somepage.html', 'r');

// 變數 $http_response_header 會神奇地出現
print_r($http_response_header);

// 或者
$meta_data = stream_get_meta_data($fp);
print_r($meta_data);

?>

向上

向下

mariano at cricava dot com ¶

19 年前

基於 Michael Knapp 的程式碼，並加入一些正規表示式，這裡提供一個函數，可以根據 URL 取得所有 meta 標籤和標題。如果有錯誤，將會回傳 false。使用包含在內的 getUrlContents() 函數，它可以處理 META REFRESH 重新導向，並追蹤至指定的重新導向次數。請注意，這裡包含的正規表示式被拆分成字串，因為 php.net 抱怨該行過長 ;)

<?php
function getUrlData($url)
{
 $result = false;
 
 $contents = getUrlContents($url);

 if (isset($contents) && is_string($contents))
 {
 $title = null;
 $metaTags = null;
 
 preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );

 if (isset($match) && is_array($match) && count($match) > 0)
 {
 $title = strip_tags($match[1]);
 }
 
 preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
 
 if (isset($match) && is_array($match) && count($match) == 3)
 {
 $originals = $match[0];
 $names = $match[1];
 $values = $match[2];
 
 if (count($originals) == count($names) && count($names) == count($values))
 {
 $metaTags = array();
 
 for ($i=0, $limiti=count($names); $i < $limiti; $i++)
 {
 $metaTags[$names[$i]] = array (
 'html' => htmlentities($originals[$i]),
 'value' => $values[$i]
 );
 }
 }
 }
 
 $result = array (
 'title' => $title,
 'metaTags' => $metaTags
 );
 }
 
 return $result;
}

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
 $result = false;
 
 $contents = @file_get_contents($url);
 
 // 檢查是否需要重新導向
 
 if (isset($contents) && is_string($contents))
 {
 preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
 
 if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
 {
 if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
 {
 return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
 }
 
 $result = false;
 }
 else
 {
 $result = $contents;
 }
 }
 
 return $contents;
}
?>

這是一個使用範例。請檢查所包含的 URL 是否有 META REFRESH 重新導向

<?php
$result = getUrlData('http://www.marianoiglesias.com.ar/');

echo '<pre>'; print_r($result); echo '</pre>';

?>

對於上面的程式碼，輸出將會是

<?php
Array
(
 [title] => Mariano Iglesias: El Eternauta 
 [metaTags] => Array
 (
 [description] => Array
 (
 [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />
 [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.
 )

 [DC.title] => Array
 (
 [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />
 [value] => Mariano Iglesias - Weblog
 )

 [ICBM] => Array
 (
 [html] => <meta name="ICBM" content="-34.6017, -58.3956" />
 [value] => -34.6017, -58.3956
 )

 [geo.position] => Array
 (
 [html] => <meta name="geo.position" content="-34.6017;-58.3956" />
 [value] => -34.6017;-58.3956
 )

 [geo.region] => Array
 (
 [html] => <meta name="geo.region" content="AR-BA">
 [value] => AR-BA
 )

 [geo.placename] => Array
 (
 [html] => <meta name="geo.placename" content="Buenos Aires">
 [value] => Buenos Aires
 )

 )

)
?>

向上

向下

LWC ¶

9 年前

基於 mariano 在 cricava dot com 的工作，推出新版本，包含：
1) 支援 Meta 屬性 (例如 Facebook 的 og 標籤)。
2) 支援 Unicode (UTF-8) 編碼的 Meta 行。
3) 可選擇不轉換 htmlentities - 如果你打算實際使用結果而不只是顯示它們。

function getUrlData($url, $raw=false) // $raw - 啟用原始顯示
{
$result = false;
   
$contents = getUrlContents($url);

if (isset($contents) && is_string($contents))
    {
$title = null;
$metaTags = null;
$metaProperties = null;
       
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );

if (isset($match) && is_array($match) && count($match) > 0)
        {
$title = strip_tags($match[1]);
        }
       
preg_match_all('/<[\s]*meta[\s]*(name|property)="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
       
if (isset($match) && is_array($match) && count($match) == 4)
        {
$originals = $match[0];
$names = $match[2];
$values = $match[3];
           
if (count($originals) == count($names) && count($names) == count($values))
            {
$metaTags = array();
$metaProperties = $metaTags;
if ($raw) {
if (version_compare(PHP_VERSION, '5.4.0') == -1)
$flags = ENT_COMPAT;
else
$flags = ENT_COMPAT | ENT_HTML401;
                }
               
for ($i=0, $limiti=count($names); $i < $limiti; $i++)
                {
if ($match[1][$i] == 'name')
$meta_type = 'metaTags';
else
$meta_type = 'metaProperties';
if ($raw)
${$meta_type}[$names[$i]] = array (
'html' => htmlentities($originals[$i], $flags, 'UTF-8'),
'value' => $values[$i]
                        );
else
${$meta_type}[$names[$i]] = array (
'html' => $originals[$i],
'value' => $values[$i]
                        );
                }
            }
        }
       
$result = array (
'title' => $title,
'metaTags' => $metaTags,
'metaProperties' => $metaProperties,
        );
    }
   
return $result;
}

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
$result = false;
   
$contents = @file_get_contents($url);
   
// 檢查是否需要前往其他地方
   
if (isset($contents) && is_string($contents))
    {
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
       
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
        {
if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
            {
return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
            }
           
$result = false;
        }
else
        {
$result = $contents;
        }
    }
   
return $contents;
}
?>

<?php
$result = getUrlData('http://whatever...', true);

echo '<pre>'; print_r($result, true); echo '</pre>';

?>

輸出範例

<?php
Array
(
 [title] => The requested page's title
 [metaTags] => Array
 (
 [description] => Array
 (
 [html] => <meta name="description" content="Something..." />
 [value] => Something...
 )
 )
 [metaProperties] => Array
 (
 [og:type] => Array
 (
 [html] => <meta property="og:type" content="article"/>/>
 [value] => article
 )
 )
)
?>

向上

向下

roganty at gmail dot com ¶

18 年前

這是一個對 jimmyxx at gmail dot com 函數的稍微修改

我嘗試使用他程式碼中顯示的正規表示式，但 php 拋出了一些錯誤

以下是可用的正確正規表示式
(請注意，我必須將正規表示式拆分成字串，因為 php.net 抱怨該行太長)
<?php
preg_match_all(
 "|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>|i",
 $html, $out,PREG_PATTERN_ORDER);
?>

問題是出在引號被錯誤地跳脫了。
我希望這對任何在使用他的程式碼時遇到問題的人有所幫助

向上

向下

tim dot bennett at haveaniceplay dot com ¶

19 年前

如果你想取得 meta 標籤以外的標籤內容，可以使用

<?php

$page = "http://www.mysite.com/apage.php";

 // 標籤
 $start = '<atag>';
 $end = '<\/atag>';

 // 開啟檔案
 $fp = fopen( $page, 'r' );

 $cont = "";

 // 讀取內容
 while( !feof( $fp ) ) {
 $buf = trim( fgets( $fp, 4096 ) );
 $cont .= $buf;
 }
 
 // 取得標籤內容
 preg_match( "/$start(.*)$end/s", $cont, $match );

 // 標籤內容
 $contents = $match[ 1 ]; 

?>

向上

向下

Michael Knapp ¶

19 年前

Tim 的程式碼不錯 (感謝 Tim)，但如果標籤是長的不換行字串的一部分，則效果不佳。

例如，嘗試從 Google Maps (http://www.google.com/maps) 取得標題。

更好的解決方案是

<?php 
$title = "";
 
if ($fp = @fopen( $_POST['url'], 'r' )) {

 $cont = "";
 
 // 讀取內容
 while( !feof( $fp ) ) {
 $buf = trim(fgets( $fp, 4096 )) ;
 $cont .= $buf;
 }

 // 取得標籤內容
 @preg_match( "/<title>([a-z 0-9]*)<\/title>/si", $cont, $match );
 
 // 標籤內容
 $title = strip_tags(@$match[ 1 ]); 
} 

?>

請注意 strip_tags 的使用。另一個需要小心的地方是檢查 ", <, 和 >。如果要把輸出張貼到表單，您需要將這些符號移除。

此外，最好使用 /i 修飾符，因為有些人可能會使用 <TITLE> 等寫法...

向上

向下

richard at pifmagazine dot com ¶

24 年前

上面沒有提到，但應該提到的是：當在遠端 PHP 頁面上使用 get_meta_tags 時，該頁面會在 meta 標籤回傳之前先被解析 - 所以您可以捕捉遠端動態生成（透過 PHP??）的 meta 標籤。


當在本地檔案系統上取得 meta 標籤時，這**不**會以相同方式運作。在回傳到 get_meta_tags() 之前，本地檔案不會透過網頁伺服器解析。如果 META 標籤是硬編碼到頁面中的，那就沒問題 - 但如果是動態產生的，除非您在呼叫本地檔案時使用完整 URL，否則您將無法捕捉到它。

向上

向下

-2

doob_ at gmx dot de ¶

16 年前

<?php 
 
/* 
** 擷取並格式化 meta 標籤內容 
*/ 
 
function get_meta_data($url, $searchkey='') { 
 $data = get_meta_tags($url); // 將 meta 資料取得為陣列 
 foreach($data as $key => $value) { 
 if(mb_detect_encoding($value, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1') { // 檢查內容是否為 UTF-8 或 ISO-8859-1 
 $value = utf8_decode($value); // 如果是 UTF-8，則解碼 
 } 
 $value = strtr($value, get_html_translation_table(HTML_ENTITIES)); // 遮罩內容 
 if($searchkey != '') { // 如果只需要一個 meta 標籤，例如 'description' 
 if($key == $searchkey) { 
 $str = $value; // 只回傳值 
 } 
 } else { // 所有 meta 標籤 
 $pattern = '/ |,/i'; // ' ' 或 ',' 
 $array = preg_split($pattern, $value, -1, PREG_SPLIT_NO_EMPTY); // 將其分割為陣列，因此我們可以計算字數 
 $str .= '<p><span style="display:block;color:#000000;font-weight:bold;">' . $key . ' <span style="font-weight:normal;">(' . count($array) . ' 個字 | ' . strlen($value) . ' 個字元)</span></span>' . $value . '</p>'; // 格式化資料，包括字數和字元數 
 } 
 } 
 return $str; 
} 
 
$content .= get_meta_data("http://www.example.com/"); 
/* 
輸出看起來像這樣： 
 
description (23 個字 | 167 個字元) 
SELFHTML 8.1.2 - Die bekannte Dokumentation zu HTML, JavaScript und CGI/Perl - Tutorial und Referenz, mit etlichen Zusatztips zu Design, Grafik, Projektverwaltung usw. 
 
keywords (13 個字 | 119 個字元) 
SELFHTML, HTML, Dynamic HTML, JavaScript, CGI, Perl, Grafik, WWW-Seiten, Web-Seiten, Hilfe, Dokumentation, Beschreibung 
 
等等 
 
*/ 
 
$content .= get_meta_data("http://www.example.com/", "description"); 
/* 
輸出看起來像這樣： 
 
SELFHTML 8.1.2 - Die bekannte Dokumentation zu HTML, JavaScript und CGI/Perl - Tutorial und Referenz, mit etlichen Zusatztips zu Design, Grafik, Projektverwaltung usw. 
*/ 
 
?>

向上

向下

-1

jstel at 126 dot com ¶

15 年前

此函式可以取得 HTML 內容的每個 meta 標籤，並移除所有 js 和 css。


<?php 
function get_meta_data($content) 
{ 
 $content = strtolower($content); 
 $content = preg_replace("'<style[^>]*>.*</style>'siU",'',$content); // 移除 js 
 $content = preg_replace("'<script[^>]*>.*</script>'siU",'',$content); // 移除 css 
 $split = explode("\n",$content); 
 foreach ($split as $k => $v) 
 { 
 if (strpos(' '.$v,'<meta')) { 
 preg_match_all( 
"/<meta[^>]+(http\-equiv|name)=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]*>/i", 
$v, $split_content[],PREG_PATTERN_ORDER);; 
 } 
 } 
 return $split_content; 
} 
?>

向上

向下

-1

Ben dot Davis at furman dot edu ¶

23 年前

我發現對於大型搜尋，get_meta_tags 非常慢。我為一個無法使用資料庫的網站建立了一個大型搜尋引擎，並且我首先嘗試提取 meta 標籤。
我發現使用 eregi 提取 meta 標籤實際上快得多。下面的程式碼會提取描述


if (eregi ("<meta name=\"description\" content=[^>]*", $contents, $descresult))
                                {

$description = explode("<meta name=\"description\" content=", $descresult[0]);
echo "<font face=\"Arial\" size=2>$description[1]</font>";
                                    

                                }

向上

向下

-2

Antonio - Malaga ¶

15 年前

如果 meta 語法沒有結尾斜線，則無法運作。

向上

向下

-3

diel at caroes dot be ¶

16 年前

快速 meta 資料擷取器
[code]
if(get_meta_tags('http://'.$_POST['pagina'])){
print '<font class="midden">來自 http://'.$_POST['pagina'].' 的 Meta 資料</font>';
$metadata = get_meta_tags('http://'.$_POST['pagina']);
echo '<table width="100%">';
print '<tr><td>Meta</td><td>值</td></tr>';
foreach($metadata as $naam => $waarde){
echo '<tr><td valign="top">'.$naam.'</td><td>'.$waarde.'</td></tr>';
        }
print '</table>';
}else{
print '
<div class="red_h">不正確</div>
        ';
    }
[/code]

向上

向下

-3

jimmyxx at gmail dot com ¶

19 年前

我將此用於我的迷你 php 搜尋引擎的一部分 - 它真的拖慢了整個速度。我寫了这个函數來讀取 HTML（只需獲取檔案或使用像 snoopy 之類的工具），並通過簡單的正規表達式提取 meta 資料，效果很好，並使我的爬蟲更快。

<?php

function get_meta_data($html) {

 preg_match_all(
 "|<meta[^>]+name=\\"([^\\"]*)\\"[^>]+content=\\"([^\\"]*)\\"[^>]+>|i", $html, $out,PREG_PATTERN_ORDER);

 for ($i=0;$i < count($out[1]);$i++) {
 // 迴圈遍歷 meta 資料 - 如果你需要，可以在這裡加入你自己的標籤
 if (strtolower($out[1][$i]) == "keywords") $meta['keywords'] = $out[2][$i];
 if (strtolower($out[1][$i]) == "description") $meta['description'] = $out[2][$i];
 }

return $meta; 
}

?>

＋新增註解