Development

#2922 ([PATCH] sfLucene : sfLuceneLowerCaseFilter does not handle correctly utf8 values)

You must first sign up to be able to contribute.

Ticket #2922 (closed defect: fixed)

Opened 5 months ago

Last modified 5 months ago

[PATCH] sfLucene : sfLuceneLowerCaseFilter does not handle correctly utf8 values

Reported by: noel Assigned to: Carl.Vondrick
Priority: major Milestone:
Component: sfLucenePlugin Version: 1.0.10
Keywords: Cc: noel
Qualification: Accepted

Description

When I try to index data containing accents or other Word chars, i get a notice like :

PHP Notice:  Undefined offset:  21731 in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388

Notice: Undefined offset:  21731 in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388
PHP Notice:  Trying to get property of non-object in /var/www/dev/plugins/sfLucenePlugin/lib/vendor/Zend/Search/Lucene/Index/SegmentInfo.php on line 1388

I'm using symfony 1.0 and the svn version of sfLucene, with utf-8 encoding and mbString enabled.

To fix these errors, we need to indicate the current encoding to the mb_strtolower() function in sfLuceneLowerCaseFilter (as is inZend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8) :

Index: lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php
===================================================================
--- lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php      (revision 7457)
+++ lib/addon/Zend/Search/Lucene/sfLuceneLowerCaseFilter.class.php      (working copy)
@@ -15,10 +15,12 @@
 class sfLuceneLowerCaseFilter extends Zend_Search_Lucene_Analysis_TokenFilter_LowerCase
 {
   protected $mbString = false;
+  protected $encoding = null;

-  public function __construct($mbString = false)
+  public function __construct($mbString = false, $encoding = null)
   {
     $this->mbString = $mbString;
+    $this->encoding = $encoding;
   }

   /**
@@ -31,7 +33,7 @@
   {
     if ($this->mbString)
     {
-      $value = mb_strtolower( $srcToken->getTermText() );
+      $value = mb_strtolower( $srcToken->getTermText(), $this->encoding);
     }
     else
     {
Index: lib/sfLucene.class.php
===================================================================
--- lib/sfLucene.class.php      (revision 7457)
+++ lib/sfLucene.class.php      (working copy)
@@ -346,7 +346,7 @@

     if (!$this->caseSensitive)
     {
-      $analyzer->addFilter(new sfLuceneLowerCaseFilter($this->mbString));
+      $analyzer->addFilter(new sfLuceneLowerCaseFilter($this->mbString, $this->encoding));
     }

     if (count($this->stopWords))

Attachments

sfLucenePatch.txt (1.3 kB) - added by noel on 02/12/08 09:55:35.

Change History

02/12/08 09:55:35 changed by noel

  • attachment sfLucenePatch.txt added.

02/12/08 15:36:55 changed by Carl.Vondrick

  • priority changed from minor to major.
  • status changed from new to assigned.
  • qualification changed from Unreviewed to Accepted.

02/13/08 02:51:42 changed by Carl.Vondrick

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [7470]) sfLucene: [1.0] fixed sfLuceneLowerCaseFilter does not handle correctly utf8 values (closes #2922)