Development

sfMogileFSPlugin

You must first sign up to be able to contribute.

sfMogileFSPlugin

Enables symfony applications to interact with MogileFS. MogileFS is an anagram for "OMG Files" and was created by LiveJournal to handle the storage, replication and retrieval of the large amount of file uploads they were, and continue to, experience. Many of the web's most popular sites use MogileFS as their file store. This includes Blip.tv, Digg, Last.fm, Friendster, Guba, Vox, and many others.

MogileFS Terminology

The following explainations are simply regurgitations of what's already on the official MogileFS Website. However, we hope to expand upon them as needed.

MogileFS Components:

  • Trackers: Act as the in-between the client library, database and storage nodes.
  • Database: maintains a list of storage nodes, files, etc.
  • Storage Nodes: Physical disks where files are actually stored.
  • Client Library: Talks to trackers. sfMogileFSPlugin acts as the client library in our case.

File Organization:

  • Domain: Top level separation of files. A domain consists of a set of classes that define the files within it.
  • Class: Part of exactly one domain. A Class, really, 'only specifies the minimum replica count of a file'. Examples of classes: image, thumbnail, video.
  • Key: A unique textual string that identifies a file. Keys are unique within domains. Examples of keys: userpicture:34:39, Bfjkd34284FDFD43432FDJKLDFjfdmnb.
  • Minimum Replica Count: Property of a 'class' that defines how many times the files in that class are to be replicated onto different devices.
  • File: A defined collection of bits stored in MogileFS. Files are replicated according to their minimum replica count. Each file has a 'key', is a part of one 'class', and is located in one 'domain'.

MogileFS Installation

Since there are no good installation instructions available for MogileFS and since it's always been a bit of a black art, they are given here for now. Much inspiration was taken from Brett Durrett's original instructions. These instructions have only been tested on Debian. It would be nice to have a MogileFS Debian package maintainer.

For simplicities sake, it is assumed that you'll initially be running everything (MogileFS database, trackers, and storage on one server) with an ip address of 127.0.0.1.

Download the MogileFS trunk.

  $ cd; mkdir mfs-trunk; cd mfs-trunk
  $ svn co http://code.sixapart.com/svn/mogilefs/trunk .

Install all dependencies.

$ perl Makefile.PL; make; make test

Note the dependencies listed by make test and either install them by hand or via CPAN.

Install dependency from CPAN (example):

# cpan
cpan> install Gearman::Client
... and just follow the directions ...

If you have to debug because of problems installing, you can set the environment variable VERBOSE to true like so:

$ VERBOSE=1; export VERBOSE

You must have libc6-dev if you don't already.

# apt-get install libc6-dev

Install package directly (example):

Either download the package and extract it or download it via CPAN (and find it in ~/.cpan/build), then run:

$ perl Makefile.PL; make; make test
... and if all tests pass ...
# make install

Once all dependencies are installed you can move on to actually configuring MogileFS.

$ cd ~/mfs-trunk/server
$ perl Makefile.PL; make; make test
# make install

$ cd ~/mfs-trunk/utils
$ perl Makefile.PL; make; make test
# make install

Setup hostnames for the various MogileFS services/daemons:

# vi /etc/hosts
127.0.0.1 mfs-db mfs-files

Next, setup the MogileFS database by typing the following command and following the instructions:

$ cd ~/mfs-trunk/server
$ ./mogdbsetup --dbhost=mfs-db.mydomain.com --dbname=mogilefs --dbuser=mogile --dbpass=mypass --dbrootuser=root --dbrootpass=myrootpass

Setup Trackers:

# adduser mogilefs
# cd /etc; mkdir mogilefs; vi mogilefs/mogilefsd.conf
... and add the following ...
db_dsn DBI:mysql:mogilefs:mfs-db.mydomain.com
db_user mogile
db_pass mypass
conf_port 7001
listener_jobs 5
trackers = 127.0.0.1

Setup storage node:

# mkdir /var/mogilefs
# vi /etc/mogilefs/mogstored.conf
... and add the following ...
httplisten=0.0.0.0:7500
mgmtlisten=0.0.0.0:7501
docroot=/var/mogilefs

Start storage node:

# mogstored --daemon

Start trackers:

$ su mogilefs
mogilefs@mydomain:~$ mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon
mogilefs@mydomain:~$ exit

Add storage server to MogileFS database:

$ mogadm host add mfs-files --ip=127.0.0.1 --status=alive
$ mogadm device add mfs-files 1
# mkdir /var/mogilefs/dev1

Setup Perlbal:

# mkdir /etc/perlbal
# touch /etc/perlbal/perlbal.conf
# touch /etc/perlbal/nodelist.dat
... add the following to perlbal.conf ...
CREATE POOL my_apaches
  # change ip address and port as appropriate
  # remember that Perlbal runs on port 80 and your site runs on another port
  POOL my_apaches ADD 111.222.333.444:8080

CREATE SERVICE balancer
  SET listen             = 111.222.333.444:80
  SET pool               = my_apaches
  SET role               = reverse_proxy
  SET enable_reproxy     = true
  SET persist_client     = on
  SET persist_backend    = on
  SET verify_backend     = on
ENABLE balancer

CREATE POOL dynamic
  SET nodefile = nodelist.dat

... add the following to nodelist.dat ...
111.222.333.444

Additional Configuration:

Create a domain and a class (Note: this is domain specific)

$ mogadm domain add mydomain
$ mogadm class add mydomain my_file_class

Be aware that all configuration should be done via the mogadm tool. Simply type the following for a list of commands:

$ mogadm

sfMogileFSPlugin Installation

$ php symfony plugin-install http://plugins.symfony-project.com/sfMogileFSPlugin

In your app.yml file add:

dev:
  sfMogileFSPlugin:
    domain:     mogilefs_domain
    timeout:    5
    trackers:   [111.111.111.1:7001, 222.222.222.2:7001]

You will also need to define settings for your production environment.

Modify your routing.yml (feel free to change "i", module, and action to whatever you'd like).

@mfs
  url:   /i/:mkey
  param: { module: file, action: showMogileFSFile }

Lastly, you'll need Curl support enabled in your php.ini.

Installation complete. Clear your cache.

N.B. This plugin was designed to primarily add and retrieve files from MogileFS. You will need to create your own media handling model(s), module(s) and action(s), though an example setup is given below.

How It All Works

Perlbal acts as a reverse-proxy load balancer that runs on port 80 with a pool of apache servers running behind it. In addition to dispatching requests to the internal servers, Perlbal also sends the critical X-REPROXY-URL header to MogileFS. MogileFS sends the file back based on the key given.

To retrieve a file (say, [http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg])

User: GET http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg

Perlbal: Hey Apache! GET http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg

Apache: Hey PHP! GET fda832lf83daDFASFAFDS.jpg from MogileFS and return me the path and X-REPROXY-URL header.

PHP: Hey Mogstored! Give me the REAL HTTP PATH of this file (such as [http://foo/dev1/0/0000/000001.fid])

PHP: Hey Apache! Fetch this file path I just got and here's and note the X-REPROXY-URL header, and mime type

Apache: Hey Perlbal! I have this weird X-REPROXY-URL thing with an internal file path, and also a mime type, here you go!

Perlbal: Thanks Apache! I can now spit it out the content to user so you/mod_php don't have to!

(anyone want to create a graphical representation of the above?)

Interface

To add a file to MogileFS:

$mfile = new sfMogileFSFile();
$mfile->setKey($mfs_key);
$mfile->setClass($mfs_class);
$mfile->setFile($local_file);
$mfile->save(); // saves to MogileFS

Alternatively:

$mfile = new sfMogileFSFile($key, $class, $file);
$mfile->save();

To retrieve a file directly from MogileFS as a string:

$mfile = sfMogileFS::loadFile($mfs_key);

To get a valid tracker path:

$mfile = sfMogileFS::loadFile($mfs_key);
$mfile->getPath();

Example Setup

This is a complete example of how you might want to handle photos. However, You should only use this as a guide and should make changes based on your own application and needs.

schema.yml:

  files:
    _attributes:       { phpName: File }
    id:
    user_id:           { type: integer, foreignTable: users, foreignReference: id, onDelete: cascade }
    file_type_id:      { type: integer, foreignTable: file_types, foreignReference: id, onDelete: cascade }
    mime_type:                 varchar(128)
    extension:                 varchar(8)
    title:                     varchar(128)
    description:               varchar(255)
    created_at:
    updated_at:

  photos:
    _attributes:            { phpName: Photo }
    id:
    file_id:                { type: integer, foreignTable: files, foreignReference: id, onDelete: cascade }
    photo_type_id:          { type: integer, foreignTable: photo_types, foreignReference: id, onDelete: cascade }
    photo_status_type_id:   { type: integer, foreignTable: photo_status_types, foreignReference: id, onDelete: cascade }
    mogilefs_key:           { type: varchar(40), phpName: MogileFSKey, required: true }
    width:                          integer
    height:                         integer
    filesize:                       integer
    created_at:

/apps/<app>/modules/file/actions/uploadAction.class.php:

class uploadAction extends sfAction
{
  public function execute()
  {
    if ($this->getRequest()->getMethod() == sfRequest::POST && $this->getRequest()->hasFiles())
    {
      if ($this->getRequest()->hasFileErrors())
      {
        $this->setFlash('notice', 'An error occurred while uploading ('.$this->getRequest->getFileError('file').')');
        $this->redirect('@upload');
      }

      // save() method adds file to File and Photo tables as well as MogileFS
      $file = new File();
      $file->setUserId($this->getUser()->getId());
      $file->setFileTypeId(1); // image
      $file->setMimeType($this->getRequest()->getFileType('file'));
      $file->setExtension($this->getRequest()->getFileExtension('file'));
      $file->setTitle($this->getRequestParameter('title'));
      $file->setDescription($this->getRequestParameter('description'));
      $file->setTmpFile($this->getRequest()->getFilePath('file')); // not actually part of model but used by it
      $file->setFilesize($this->getRequest()->getFileSize('file')); // not actually part of the model but used by it
      $file->save();

      // delete upload
      unlink($file->getTmpFile());

      // redirect to upload form
      $this->setFlash('notice', 'Your file has been uploaded');
      $this->redirect('@upload');
    }
    else
    {
      // display upload form
      return sfView::SUCCESS;
    }
  }

/<project>/lib/model/Photo.php:

class Photo extends BasePhoto
{
  public function save($con = null)
  {
    if ($this->isNew())
    {
      $this->setMogileFSKey(hash('sha1', $this->getFileId().':'.$this->getId()));
    }
    parent::save($con);
  }
}

/<project>/lib/model/File.php:

class File extends BaseFile
{
  protected $filesize;
  protected $tmp_file;

  public function setFilesize($fileSize)
  {
    $this->filesize = $fileSize;
  }

  public function getFilesize()
  {
    return $this->filesize;
  }

  public function setTmpFile($filePath)
  {
    $this->tmp_file = $filePath;
  }

  public function getTmpFile()
  {
    return $this->tmp_file;
  }

  /**
   * save
   * adds original file to file table, sub-file (photo, etc) table and MogileFS
   * All steps are wrapped in a transaction.
   *
   * @param mixed $con
   * @access public
   * @return void
   */
  public function save($con = null)
  {
    if ($this->isNew())
    {
      list($width, $height) = getimagesize($this->getTmpFile());

      // store it
      $con = Propel::getConnection();
      try
      {
        $con->begin();

        // save to File table
        parent::save($con);

        // add to photo table
        if ($this->getFileTypeId() == 1)
        {
          $photo = new Photo();
          $photo->setFileId($this->getId()); // parent file
          $photo->setPhotoTypeId(1); // original photo
          $photo->setPhotoStatusTypeId(1); // uploaded but not converted
          $photo->setWidth($width);
          $photo->setHeight($height);
          $photo->setFilesize($this->getFilesize());
          $photo->save();
        }

        // save to MogileFS
        $sfMogileFSFile = new sfMogileFSFile();
        $sfMogileFSFile->setKey($photo->getMogileFSKey());
        $sfMogileFSFile->setClass('orig_photo');
        $sfMogileFSFile->setFile($this->getTmpFile());
        $sfMogileFSFile->setFilesize($photo->getFilesize());
        $sfMogileFSFile->save();

        $con->commit();
      }
      catch (Exception $e)
      {
        $con->rollback();
        throw $e;
      }
    }
  }
}

Example #1 Displaying a Photo (slower)

Example assumes you already have a file in MogileFS with the key "cab9f3c712d04de874dafb0af0a0bf03e303e6e0".

routing.yml:

mfs:
  url:   /i/:mkey
  param: { module: file, action: showMogileFSFile }

/apps/<app>/modules/file/actions/showMogileFSFileAction.class.php:

class showMogileFSFileAction extends sfAction
{
  public function execute()
  {
    // remove file's extension
    $mogileFSKey = preg_replace('@\.[\w\d]+$@', '', $this->getRequestParameter('mkey'));

    // get mogilefs key from photo table
    $c = new Criteria();
    $c->add(PhotoPeer::MOGILEFS_KEY, $mogileFSKey);
    $photo = PhotoPeer::doSelectOne($c);

    // load path from mogilefs
    $sfMogileFSRemoteFile = sfMogileFS::loadFile($mogileFSKey);

    // call perlbal's x-reproxy-url and set mime type
    $this->getResponse()->setHttpHeader('Content-type', $photo->getFile()->getMimeType(), true);
    $this->getResponse()->setHttpHeader('X-REPROXY-URL', $sfMogileFSRemoteFile->getPath(), true);

    return sfView::HEADER_ONLY;
  }
}

layout.php (or any template):

<?php use_helper('MogileFSAsset') ?>
<?php echo mfs_image_tag('cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg') ?>

This results in the following HTML rendering:

<img src="/index.php/i/cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg" />

Example #2 Displaying a Photo (faster)

Instead of calling the entire Symfony Framework to output each of our MogileFS files, we can make the calls more lightweight by using a separate php script and can maintain our pretty url's with mod_rewrite.

# apache.conf file
# /i/fjDFJKEJKjfdjadsdkjfds798234432.jpg => /mogilefs.php?mkey=fjDFJKEJKjfdjadsdkjfds798234432
RewriteRule ^i/([A-Za-z0-9]+)[.A-Za-z0-9]*/?$ mogilefs.php?mkey=$1 [L]
//
// example mogilefs.php, to be placed in your web/ folder
//

require_once('/my/path/to/symfony/lib/config/sfConfig.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSBaseFile.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSRemoteFile.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSConnection.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFS.class.php');

sfConfig::set('app_sfMogileFSPlugin_trackers', array('127.0.0.1:7001'));
sfConfig::set('app_sfMogileFSPlugin_domain', 'my_domain');

$mogileFSKey = preg_replace('@\.[\w\d]+$@', '', $_GET['mkey']);

$link = mysql_connect('127.0.0.1', 'user', 'pass');
mysql_select_db('mydb', $link);

$q = sprintf("select files.mime_type, photos.mogilefs_key from files 
              left join photos on (files.id = photos.file_id) 
              where photos.mogilefs_key='%s'", mysql_real_escape_string($mogileFSKey)
);

$row = mysql_fetch_row(mysql_query($q));

mysql_close($link);

$sfMogileFSRemoteFile = sfMogileFS::loadFile($row[1]);

header("Content-Type: {$row[0]}");
header("X-REPROXY-URL: {$sfMogileFSRemoteFile->getPath()}");

Then in your template file just call the image like normal, i.e.

<img src="/i/cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg" />

Benchmarking Info

Simple ab tests reveal the following information (note the actual numbers are NOT what's important) about fetching a single 1544x1024 200KB JPEG image from MogileFS under various scenarios:

All else being constant...

  • No mod_rewrite. Request goes through Symfony: ~55ms
  • mod_rewrite. Request does NOT go through Symfony: ~12ms
  • mod_rewrite, memcached (skipping a db select): ~12ms
  • mod_rewrite, memcached (skipping a db select, and skipping MogileFS lookups, though image still fetched from tracker and hard drive): ~9ms
  • Image fetched directly from memcached (just out of curiousity - coming soon): ~?

MogileFS FAQ

Questions and answers have been gleaned from personal experience as well as the MogileFS Mailing List.

How can I maximize browser download speed?

  • Setup 2-3 hosts/aliases (img1.mydomain.com, img2..., etc.) to serve images from so browsers will download them concurrently. Saunders' new book talks about this technique (listed in Optimization).

How can I prevent the database from performing the same reads over and over again as the same files are being requested over and over?

What other caching methods are available?

  • Perlbal can cache but it's best used for its reproxying ability. Squid is often run on top of Perlbal for caching.

MogileFS Summit [pdf] (9/19/2006)

Todo

  • Graphical representation of how a typical request works
  • MogileFS administration via backend

Changelog

trunk

  • bmeynell: updated README

2007-09-22 | 0.6.0 Beta

  • bmeynell: Initial release

2007-07-19 | 0.6.0 Alpha

  • bmeynell: Initial release

Active tickets

Attachments