Zend_Search_Lucene works with the UTF-8 charset internally. Index files store unicode data in Java's "modified UTF-8 encoding". Zend_Search_Lucene core completely supports this encoding with one exception. 
Actual input data encoding may be specified through Zend_Search_Lucene API. Data will be automatically converted into UTF-8 encoding.
However, the default text analyzer (which is also used within query parser) uses ctype_alpha() for tokenizing text and queries.
ctype_alpha() is not UTF-8 compatible, so the analyzer converts text to 'ASCII//TRANSLIT' encoding before indexing. The same processing is transparently performed during query parsing. 
Default analyzer doesn't treats numbers as parts of terms. Use corresponding 'Num' analyzer if you don't want words to be broken by numbers.
Zend_Search_Lucene also contains a set of UTF-8 compatible analyzers:
Any of this analyzers can be enabled with the code like this:
UTF-8 compatible analyzers were improved in ZF 1.5. Early versions of analyzers assumed all non-ascii characters are letters. New analyzers implementation has more accurate behavior.
This may need you to re-build index to have data and search queries tokenized in the same way, otherwise search engine may return wrong result sets.
All of these analyzers need PCRE (Perl-compatible regular expressions) library to be compiled with UTF-8 support turned on. PCRE UTF-8 support is turned on for the PCRE library sources bandled with PHP source code distribution, but if shared libraru is used instead of bandled with PHP sources, then UTF-8 support state may depend on you operation system.
Use the following code to check, if PCRE UTF-8 suppor is enabled:
Case insensitive versions of UTF-8 compatible analyzers also need » mbstring extension to be enabled.
If you don't want mbstring extension to be turned on, but need case insensitive search, you may use the following approach: normalize source data before indexing and query string before searching by converting them to lowercase:addField(Zend_Search_Lucene_Field::UnStored('contents', strtolower($contents))); // Title field for search through (indexed, unstored) $doc->addField(Zend_Search_Lucene_Field::UnStored('title', strtolower($title))); // Title field for retrieving (unindexed, stored) $doc->addField(Zend_Search_Lucene_Field::UnIndexed('_title', $title));find(strtolower($query));