When I try to index data containing accents or other Word chars, i get a notice like :
PHP Notice: Undefined offset: 21731 in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388
Notice: Undefined offset: 21731 in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388
PHP Notice: Trying to get property of non-object in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388
I'm using symfony 1.0 and the svn version of sfLucene, with utf-8 encoding and mbString enabled.
To fix these errors, we need to indicate the current encoding to the mb_strtolower() function in sfLuceneLowerCaseFilter (as is inZend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8) :
Index: lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php
===================================================================
--- lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php (revision 7457)
+++ lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php (working copy)
@@ -15,10 +15,12 @@
class sfLuceneLowerCaseFilter extends Zend_Search_Lucene_Analysis_TokenFilter_LowerCase
{
protected $mbString = false;
+ protected $encoding = null;
- public function __construct($mbString = false)
+ public function __construct($mbString = false, $encoding = null)
{
$this->mbString = $mbString;
+ $this->encoding = $encoding;
}
/**
@@ -31,7 +33,7 @@
{
if ($this->mbString)
{
- $value = mb_strtolower( $srcToken->getTermText() );
+ $value = mb_strtolower( $srcToken->getTermText(), $this->encoding);
}
else
{
Index: lib/sfLucene.class.php
===================================================================
--- lib/sfLucene.class.php (revision 7457)
+++ lib/sfLucene.class.php (working copy)
@@ -346,7 +346,7 @@
if (!$this->caseSensitive)
{
- $analyzer->addFilter(new sfLuceneLowerCaseFilter($this->mbString));
+ $analyzer->addFilter(new sfLuceneLowerCaseFilter($this->mbString, $this->encoding));
}
if (count($this->stopWords))