utf8_encode

(PHP 4, PHP 5, PHP 7, PHP 8)

utf8_encode — Convertit une chaîne ISO-8859-1 en UTF-8

Avertissement

Cette fonction est OBSOLÈTE à partir de PHP 8.2.0. Dépendre de cette fonction est fortement déconseillé.

Description

#[\Deprecated]
function utf8_encode(string $string): string

Cette fonction convertit la chaîne string depuis l'encodage ISO-8859-1 vers UTF-8.

Note:
Cette fonction n'essaie pas de deviner l'encodage actuel de la chaîne de caractères fournie, elle suppose qu'elle est encodée en ISO-8859-1 (également connu sous le nom de "Latin 1") et la convertit en UTF-8. Puisque chaque séquence d'octets est une chaîne de caractères ISO-8859-1 valide, il n'y a jamais d'erreur, mais il n'en résultera pas une chaîne de caractères utile si un encodage différent était prévu.

De nombreuses pages Web marquées comme utilisant l'encodage de caractère ISO-8859-1 utilisent effectivement un encodage similaire à Windows-1252, et les navigateurs Web interpréteront les pages Web ISO-8859-1 comme Windows-1252. Les caractéristiques supplémentaires de Windows-1252 sont des caractères imprimables, tels que le signe euro (€) et les guillemets bouclés (“ ”), à la place de certains caractères de contrôle de l'ISO-8859-1. Cette fonction ne convertira pas ces caractères Windows-1252 correctement. Utiliser une fonction différente si une conversion Windows-1252 est nécessaire.

Liste de paramètres

string: Une chaîne ISO-8859-1.

Valeurs de retour

Retourne la version UTF-8 de string.

Historique

Version	Description
8.2.0	Cette fonction a été rendue obsolète.
7.2.0	Cette fonction a été déplacée dans le cœur de PHP, auparavant, il était nécessaire d'installer l'extension XML pour l'utiliser.

Exemples

Exemple #1 Exemple d'utilisation

<?php
// Convertit la chaîne 'Zoë' de ISO 8859-1 à UTF-8
$iso8859_1_string = "\x5A\x6F\xEB";
$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>

L'exemple ci-dessus va afficher :

5a6fc3ab

Notes

Note: Dépréciation et alternatives

Cette fonction est obsolète à partir de PHP 8.2.0 et sera supprimée dans une version future. Les utilisations existantes devraient être vérifiées et remplacées par des alternatives appropriées.

Une fonctionnalité similaire peut être obtenue avec mb_convert_encoding(), qui prend en charge ISO-8859-1 et de nombreux autres encodages de caractères.
<?php
$iso8859_1_string = "\xEB"; // 'ë' (e tréma) en ISO-8859-1
$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$iso8859_7_string = "\xEB"; // la même chaîne en ISO-8859-7 représente 'λ' (lambda grec minuscule)
$utf8_string = mb_convert_encoding($iso8859_7_string, 'UTF-8', 'ISO-8859-7');
echo bin2hex($utf8_string), "\n";

$windows_1252_string = "\x80"; // '€' (signe euro) en Windows-1252, mais pas en ISO-8859-1
$utf8_string = mb_convert_encoding($windows_1252_string, 'UTF-8', 'Windows-1252');
echo bin2hex($utf8_string), "\n";
?>
L'exemple ci-dessus va afficher :
c3ab
cebb
e282ac
D'autres options peuvent être disponibles en fonction des extensions installées, telles que UConverter::transcode() et iconv().

Les exemples suivants donnent tous le même résultat :
<?php
$iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' en ISO-8859-1

$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";

$utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1');
echo bin2hex($utf8_string), "\n";

$utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>
L'exemple ci-dessus va afficher :
5a6fc3ab
5a6fc3ab
5a6fc3ab
5a6fc3ab

Voir aussi

utf8_decode() - Convertit une chaîne UTF-8 en ISO-8859-1, en remplaçant les caractères invalides ou non représentables.
mb_convert_encoding() - Convertir une chaîne d'un codage de caractères à un autre
UConverter::transcode() - Convertit une chaîne d'un jeu de caractères à un autre
iconv() - Convertit une chaîne de caractères à partir d'un encodage vers un autre

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 3 notes

down

139

deceze at gmail dot com ¶

14 years ago

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in  ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

down

Aidan Kehoe <php-manual at parhasard dot net> ¶

21 years ago

Here's some code that addresses the issue that Steven describes in the previous comment; 

<?php

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
   as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
   the UTF-8 encoding of the non-control characters that Windows-1252 places
   at the equivalent code points. */

$cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
    "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
    "\xc2\x83" => "\xc6\x92",     /* LATIN SMALL LETTER F WITH HOOK */
    "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
    "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
    "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
    "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
    "\xc2\x88" => "\xcb\x86",     /* MODIFIER LETTER CIRCUMFLEX ACCENT */
    "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
    "\xc2\x8a" => "\xc5\xa0",     /* LATIN CAPITAL LETTER S WITH CARON */
    "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
    "\xc2\x8c" => "\xc5\x92",     /* LATIN CAPITAL LIGATURE OE */
    "\xc2\x8e" => "\xc5\xbd",     /* LATIN CAPITAL LETTER Z WITH CARON */
    "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
    "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
    "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
    "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
    "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
    "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
    "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */

    "\xc2\x98" => "\xcb\x9c",     /* SMALL TILDE */
    "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
    "\xc2\x9a" => "\xc5\xa1",     /* LATIN SMALL LETTER S WITH CARON */
    "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
    "\xc2\x9c" => "\xc5\x93",     /* LATIN SMALL LIGATURE OE */
    "\xc2\x9e" => "\xc5\xbe",     /* LATIN SMALL LETTER Z WITH CARON */
    "\xc2\x9f" => "\xc5\xb8"      /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8($str) {
        global $cp1252_map; 
        return  strtr(utf8_encode($str), $cp1252_map);
}

?>

down

Mark AT modernbill DOT com ¶

21 years ago

If you haven't guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren't saving a bunch of ???? into your database.

＋add a note