Here is a function I wrote to capitalize the previous remarks about charset problems (UTF-8...) when using loadHTML and then DOM functions.
It adds the charset meta tag just after <head> to improve automatic encoding detection, converts any specific character to an html entity, thus PHP DOM functions/attributes will return correct values.
<?php
mb_detect_order("ASCII,UTF-8,ISO-8859-1,windows-1252,iso-8859-15");
function loadNprepare($url,$encod='') {
        $content        = file_get_contents($url);
        if (!empty($content)) {
                if (empty($encod))
                        $encod  = mb_detect_encoding($content);
                $headpos        = mb_strpos($content,'<head>');
                if (FALSE=== $headpos)
                        $headpos= mb_strpos($content,'<HEAD>');
                if (FALSE!== $headpos) {
                        $headpos+=6;
                        $content = mb_substr($content,0,$headpos) . '<meta http-equiv="Content-Type" content="text/html; charset='.$encod.'">' .mb_substr($content,$headpos);
                }
                $content=mb_convert_encoding($content, 'HTML-ENTITIES', $encod);
        }
        $dom = new DomDocument;
        $res = $dom->loadHTML($content);
        if (!$res) return FALSE;
        return $dom;
}
?>
NB: it uses mb_strpos/mb_substr instead of mb_ereg_replace because that seemed more efficient with huge html pages.