str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_count Restituisce informazioni sulle parole utilizzate in una stringa

Descrizione

str_word_count(string $string, int $format = ?, string $charlist = ?): mixed

Conta il numero di parole presenti in string. Se non viene indicato il parametro opzionale format, allora la funzione restituirà un intero indicante il numero di parole trovate. Nel caso in cui sia specificato format, la funzione restituisce una matrice il cui contenuto dipende dal parametro format. I possibili valori per format ed i rispettivi output sono elencati di seguito.

Per gli scopi di questa funzione 'parola' è definita come testo dipendente dalle impostazioni locali, contenente caratteri alfabetici, che può contenere, ma non cominciare, con i caratteri "'" and "-".

Elenco dei parametri

string

La stringa

format

Indica il tipo di valore restituito dalla funzione. Le attuali impostazioni ammesse per il parametro sono:

  • 0 - restituisce il numero di parole trovate
  • 1 - restituisce una matrice contenente tutte le parole trovate all'interno di string.
  • 2 - restituisce una matrice associativa, in cui le chiavi sono la posizione numerica della parola in string ed il valore è la parola stessa.

charlist

Lista di caratteri addizionali da considerare come 'parole'

Valori restituiti

Restituisce un intero od una matrice in base al valore di format.

Log delle modifiche

Versione Descrizione
5.1.0 Aggiunto il parametro charlist

Esempi

Example #1 Esempio di uso di str_word_count()

<?php

$str
= "Hello fri3nd, you're
looking good today!"
;

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo
str_word_count($str);

?>

Il precedente esempio visualizzerà:

Array
(
    [0] => Hello
    [1] => fri
    [2] => nd
    [3] => you're
    [4] => looking
    [5] => good
    [6] => today
)

Array
(
    [0] => Hello
    [6] => fri
    [10] => nd
    [14] => you're
    [29] => looking
    [46] => good
    [51] => today
)

Array
(
    [0] => Hello
    [1] => fri3nd
    [2] => you're
    [3] => looking
    [4] => good
    [5] => today
)

7

Vedere anche:

  • explode() - Suddivide una stringa
  • preg_split() - Suddivisione di una stringa tramite le espressioni regolari
  • split()
  • count_chars() - Restituisce informazioni sui caratteri usati in una stringa
  • substr_count() - Conta il numero di occorrenze di un segmento di stringa

add a note

User Contributed Notes 11 notes

up
40
cito at wikatu dot com
14 years ago
<?php

/***
 * This simple utf-8 word count function (it only counts) 
 * is a bit faster then the one with preg_match_all
 * about 10x slower then the built-in str_word_count
 * 
 * If you need the hyphen or other code points as word-characters
 * just put them into the [brackets] like [^\p{L}\p{N}\'\-]
 * If the pattern contains utf-8, utf8_encode() the pattern,
 * as it is expected to be valid utf-8 (using the u modifier).
 **/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
  return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>
up
17
splogamurugan at gmail dot com
17 years ago
We can also specify a range of values for charlist.

<?php
$str = "Hello fri3nd, you're
       looking          good today! 
       look1234ing";
print_r(str_word_count($str, 1, '0..3'));
?>

will give the result as 

Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )
up
1
Adeel Khan
18 years ago
<?php

/**
 * Returns the number of words in a string.
 * As far as I have tested, it is very accurate.
 * The string can have HTML in it,
 * but you should do something like this first:
 *
 *    $search = array(
 *      '@<script[^>]*?>.*?</script>@si',
 *      '@<style[^>]*?>.*?</style>@siU',
 *      '@<![\s\S]*?--[ \t\n\r]*>@'
 *    );
 *    $html = preg_replace($search, '', $html);
 *
 */

function word_count($html) {

  # strip all html tags
  $wc = strip_tags($html);

  # remove 'words' that don't consist of alphanumerical characters or punctuation
  $pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
  $wc = trim(preg_replace($pattern, " ", $wc));

  # remove one-letter 'words' that consist only of punctuation
  $wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

  # remove superfluous whitespace
  $wc = preg_replace("/\s\s+/", " ", $wc);

  # split string into an array of words
  $wc = explode(" ", $wc);

  # remove empty elements
  $wc = array_filter($wc);

  # return the number of words
  return count($wc);

}

?>
up
1
manrash at gmail dot com
17 years ago
For spanish speakers a valid character map may be:

<?php
$characterMap = 'áéíóúüñ';

$count = str_word_count($text, 0, $characterMap);
?>
up
1
uri at speedy dot net
13 years ago
Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php 

function count_words($string) {
    // Return the number of words in a string.
    $string= str_replace("&#039;", "'", $string);
    $t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
    $string= str_replace($t, " ", $string);
    $string= trim(preg_replace("/\s+/", " ", $string));
    $num= 0;
    if (my_strlen($string)>0) {
        $word_array= explode(" ", $string);
        $num= count($word_array);
    }
    return $num;
}

function my_strlen($s) {
    // Return mb_strlen with encoding UTF-8.
    return mb_strlen($s, "UTF-8");
}

?>
up
1
brettNOSPAM at olwm dot NO_SPAM dot com
23 years ago
This example may not be pretty, but It proves accurate:

<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>

Hope I didn't miss any punctuation. ;-)
up
0
php dot net at salagir dot com
8 years ago
This function doesn't handle  accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
      return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
    $a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>
up
0
dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64)
15 years ago
to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
    $text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
    $text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
    $text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
    $len = strlen($text);
    
    if (0 === $len) {
        return 0;
    }
    
    $words = 1;
    
    while ($len--) {
        if (' ' === $text[$len]) {
            ++$words;
        }
    }
    
    return $words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string
up
0
brettz9 - see yahoo
15 years ago
Words also cannot end in a hyphen unless allowed by the charlist...
up
0
charliefrancis at gmail dot com
16 years ago
Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
    if( strlen( $text ) > $limit ) {
        $words = str_word_count( $text, 2, $chars );
        $words = array_reverse( $words, TRUE );
        foreach( $words as $length => $word ) {
            if( $length + strlen( $word ) >= $limit ) {
                array_shift( $words );
            } else {
                break;
            }
        }
        $words = array_reverse( $words );
        $text = implode( " ", $words ) . '&hellip;';
    }
    return $text;
}

$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words
up
0
MadCoder
20 years ago
Here's a function that will trim a $string down to a certian number of words, and add a...   on the end of it.
(explansion of muz1's 1st 100 words code)

----------------------------------------------
<?php
function trim_text($text, $count){
$text = str_replace("  ", " ", $text);
$string = explode(" ", $text);
for ( $wordCounter = 0; $wordCounter <= $count;wordCounter++ ){ 
$trimed .= $string[$wordCounter];
if ( $wordCounter < $count ){ $trimed .= " "; }
else { $trimed .= "..."; }
}
$trimed = trim($trimed);
return $trimed;
}
?>

Usage
------------------------------------------------
<?php
$string = "one two three four";
echo trim_text($string, 3);
?>

returns:
one two three...
To Top