sfMogileFSPlugin
Enables symfony applications to interact with MogileFS. MogileFS is an anagram for "OMG Files" and was created by LiveJournal to handle the storage, replication and retrieval of the large amount of file uploads they were, and continue to, experience. Many of the web's most popular sites use MogileFS as their file store. This includes Blip.tv, Digg, Last.fm, Friendster, Guba, Vox, and many others.
MogileFS Terminology
The following explainations are simply regurgitations of what's already on the official MogileFS Website. However, we hope to expand upon them as needed.
MogileFS Components:
- Trackers: Act as the in-between the client library, database and storage nodes.
- Database: maintains a list of storage nodes, files, etc.
- Storage Nodes: Physical disks where files are actually stored.
- Client Library: Talks to trackers. sfMogileFSPlugin acts as the client library in our case.
File Organization:
- Domain: Top level separation of files. A domain consists of a set of classes that define the files within it.
- Class: Part of exactly one domain. A Class, really, 'only specifies the minimum replica count of a file'. Examples of classes: image, thumbnail, video.
- Key: A unique textual string that identifies a file. Keys are unique within domains. Examples of keys: userpicture:34:39, Bfjkd34284FDFD43432FDJKLDFjfdmnb.
- Minimum Replica Count: Property of a 'class' that defines how many times the files in that class are to be replicated onto different devices.
- File: A defined collection of bits stored in MogileFS. Files are replicated according to their minimum replica count. Each file has a 'key', is a part of one 'class', and is located in one 'domain'.
MogileFS Installation
Since there are no good installation instructions available for MogileFS and since it's always been a bit of a black art, they are given here for now. Much inspiration was taken from Brett Durrett's original instructions. These instructions have only been tested on Debian. It would be nice to have a MogileFS Debian package maintainer.
For simplicities sake, it is assumed that you'll initially be running everything (MogileFS database, trackers, and storage on one server) with an ip address of 127.0.0.1.
Download the MogileFS trunk.
$ cd; mkdir mfs-trunk; cd mfs-trunk $ svn co http://code.sixapart.com/svn/mogilefs/trunk .
Install all dependencies.
$ perl Makefile.PL; make; make test
Note the dependencies listed by make test and either install them by hand or via CPAN.
Install dependency from CPAN (example):
# cpan cpan> install Gearman::Client ... and just follow the directions ...
If you have to debug because of problems installing, you can set the environment variable VERBOSE to true like so:
$ VERBOSE=1; export VERBOSE
You must have libc6-dev if you don't already.
# apt-get install libc6-dev
Install package directly (example):
Either download the package and extract it or download it via CPAN (and find it in ~/.cpan/build), then run:
$ perl Makefile.PL; make; make test ... and if all tests pass ... # make install
Once all dependencies are installed you can move on to actually configuring MogileFS.
$ cd ~/mfs-trunk/server $ perl Makefile.PL; make; make test # make install $ cd ~/mfs-trunk/utils $ perl Makefile.PL; make; make test # make install
Setup hostnames for the various MogileFS services/daemons:
# vi /etc/hosts 127.0.0.1 mfs-db mfs-files
Next, setup the MogileFS database by typing the following command and following the instructions:
$ cd ~/mfs-trunk/server $ ./mogdbsetup --dbhost=mfs-db.mydomain.com --dbname=mogilefs --dbuser=mogile --dbpass=mypass --dbrootuser=root --dbrootpass=myrootpass
Setup Trackers:
# adduser mogilefs # cd /etc; mkdir mogilefs; vi mogilefs/mogilefsd.conf ... and add the following ... db_dsn DBI:mysql:mogilefs:mfs-db.mydomain.com db_user mogile db_pass mypass conf_port 7001 listener_jobs 5 trackers = 127.0.0.1
Setup storage node:
# mkdir /var/mogilefs # vi /etc/mogilefs/mogstored.conf ... and add the following ... httplisten=0.0.0.0:7500 mgmtlisten=0.0.0.0:7501 docroot=/var/mogilefs
Start storage node:
# mogstored --daemon
Start trackers:
$ su mogilefs mogilefs@mydomain:~$ mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon mogilefs@mydomain:~$ exit
Add storage server to MogileFS database:
$ mogadm host add mfs-files --ip=127.0.0.1 --status=alive $ mogadm device add mfs-files 1 # mkdir /var/mogilefs/dev1
Setup Perlbal:
# mkdir /etc/perlbal # touch /etc/perlbal/perlbal.conf # touch /etc/perlbal/nodelist.dat ... add the following to perlbal.conf ... CREATE POOL my_apaches # change ip address and port as appropriate # remember that Perlbal runs on port 80 and your site runs on another port POOL my_apaches ADD 111.222.333.444:8080 CREATE SERVICE balancer SET listen = 111.222.333.444:80 SET pool = my_apaches SET role = reverse_proxy SET enable_reproxy = true SET persist_client = on SET persist_backend = on SET verify_backend = on ENABLE balancer CREATE POOL dynamic SET nodefile = nodelist.dat ... add the following to nodelist.dat ... 111.222.333.444
Additional Configuration:
Create a domain and a class (Note: this is domain specific)
$ mogadm domain add mydomain $ mogadm class add mydomain my_file_class
Be aware that all configuration should be done via the mogadm tool. Simply type the following for a list of commands:
$ mogadm
sfMogileFSPlugin Installation
$ php symfony plugin-install http://plugins.symfony-project.com/sfMogileFSPlugin
In your app.yml file add:
dev:
sfMogileFSPlugin:
domain: mogilefs_domain
timeout: 5
trackers: [111.111.111.1:7001, 222.222.222.2:7001]
You will also need to define settings for your production environment.
Modify your routing.yml (feel free to change "i", module, and action to whatever you'd like).
@mfs
url: /i/:mkey
param: { module: file, action: showMogileFSFile }
Lastly, you'll need Curl support enabled in your php.ini.
Installation complete. Clear your cache.
N.B. This plugin was designed to primarily add and retrieve files from MogileFS. You will need to create your own media handling model(s), module(s) and action(s), though an example setup is given below.
How It All Works
Perlbal acts as a reverse-proxy load balancer that runs on port 80 with a pool of apache servers running behind it. In addition to dispatching requests to the internal servers, Perlbal also sends the critical X-REPROXY-URL header to MogileFS. MogileFS sends the file back based on the key given.
To retrieve a file (say, [http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg])
User: GET http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg
Perlbal: Hey Apache! GET http://img1.mydomain.com/fda832lf83daDFASFAFDS.jpg
Apache: Hey PHP! GET fda832lf83daDFASFAFDS.jpg from MogileFS and return me the path and X-REPROXY-URL header.
PHP: Hey Mogstored! Give me the REAL HTTP PATH of this file (such as [http://foo/dev1/0/0000/000001.fid])
PHP: Hey Apache! Fetch this file path I just got and here's and note the X-REPROXY-URL header, and mime type
Apache: Hey Perlbal! I have this weird X-REPROXY-URL thing with an internal file path, and also a mime type, here you go!
Perlbal: Thanks Apache! I can now spit it out the content to user so you/mod_php don't have to!
(anyone want to create a graphical representation of the above?)
Interface
To add a file to MogileFS:
$mfile = new sfMogileFSFile(); $mfile->setKey($mfs_key); $mfile->setClass($mfs_class); $mfile->setFile($local_file); $mfile->save(); // saves to MogileFS
Alternatively:
$mfile = new sfMogileFSFile($key, $class, $file); $mfile->save();
To retrieve a file directly from MogileFS as a string:
$mfile = sfMogileFS::loadFile($mfs_key);
To get a valid tracker path:
$mfile = sfMogileFS::loadFile($mfs_key); $mfile->getPath();
Example Setup
This is a complete example of how you might want to handle photos. However, You should only use this as a guide and should make changes based on your own application and needs.
schema.yml:
files:
_attributes: { phpName: File }
id:
user_id: { type: integer, foreignTable: users, foreignReference: id, onDelete: cascade }
file_type_id: { type: integer, foreignTable: file_types, foreignReference: id, onDelete: cascade }
mime_type: varchar(128)
extension: varchar(8)
title: varchar(128)
description: varchar(255)
created_at:
updated_at:
photos:
_attributes: { phpName: Photo }
id:
file_id: { type: integer, foreignTable: files, foreignReference: id, onDelete: cascade }
photo_type_id: { type: integer, foreignTable: photo_types, foreignReference: id, onDelete: cascade }
photo_status_type_id: { type: integer, foreignTable: photo_status_types, foreignReference: id, onDelete: cascade }
mogilefs_key: { type: varchar(40), phpName: MogileFSKey, required: true }
width: integer
height: integer
filesize: integer
created_at:
/apps/<app>/modules/file/actions/uploadAction.class.php:
class uploadAction extends sfAction
{
public function execute()
{
if ($this->getRequest()->getMethod() == sfRequest::POST && $this->getRequest()->hasFiles())
{
if ($this->getRequest()->hasFileErrors())
{
$this->setFlash('notice', 'An error occurred while uploading ('.$this->getRequest->getFileError('file').')');
$this->redirect('@upload');
}
// save() method adds file to File and Photo tables as well as MogileFS
$file = new File();
$file->setUserId($this->getUser()->getId());
$file->setFileTypeId(1); // image
$file->setMimeType($this->getRequest()->getFileType('file'));
$file->setExtension($this->getRequest()->getFileExtension('file'));
$file->setTitle($this->getRequestParameter('title'));
$file->setDescription($this->getRequestParameter('description'));
$file->setTmpFile($this->getRequest()->getFilePath('file')); // not actually part of model but used by it
$file->setFilesize($this->getRequest()->getFileSize('file')); // not actually part of the model but used by it
$file->save();
// delete upload
unlink($file->getTmpFile());
// redirect to upload form
$this->setFlash('notice', 'Your file has been uploaded');
$this->redirect('@upload');
}
else
{
// display upload form
return sfView::SUCCESS;
}
}
/<project>/lib/model/Photo.php:
class Photo extends BasePhoto
{
public function save($con = null)
{
if ($this->isNew())
{
$this->setMogileFSKey(hash('sha1', $this->getFileId().':'.$this->getId()));
}
parent::save($con);
}
}
/<project>/lib/model/File.php:
class File extends BaseFile
{
protected $filesize;
protected $tmp_file;
public function setFilesize($fileSize)
{
$this->filesize = $fileSize;
}
public function getFilesize()
{
return $this->filesize;
}
public function setTmpFile($filePath)
{
$this->tmp_file = $filePath;
}
public function getTmpFile()
{
return $this->tmp_file;
}
/**
* save
* adds original file to file table, sub-file (photo, etc) table and MogileFS
* All steps are wrapped in a transaction.
*
* @param mixed $con
* @access public
* @return void
*/
public function save($con = null)
{
if ($this->isNew())
{
list($width, $height) = getimagesize($this->getTmpFile());
// store it
$con = Propel::getConnection();
try
{
$con->begin();
// save to File table
parent::save($con);
// add to photo table
if ($this->getFileTypeId() == 1)
{
$photo = new Photo();
$photo->setFileId($this->getId()); // parent file
$photo->setPhotoTypeId(1); // original photo
$photo->setPhotoStatusTypeId(1); // uploaded but not converted
$photo->setWidth($width);
$photo->setHeight($height);
$photo->setFilesize($this->getFilesize());
$photo->save();
}
// save to MogileFS
$sfMogileFSFile = new sfMogileFSFile();
$sfMogileFSFile->setKey($photo->getMogileFSKey());
$sfMogileFSFile->setClass('orig_photo');
$sfMogileFSFile->setFile($this->getTmpFile());
$sfMogileFSFile->setFilesize($photo->getFilesize());
$sfMogileFSFile->save();
$con->commit();
}
catch (Exception $e)
{
$con->rollback();
throw $e;
}
}
}
}
Example #1 Displaying a Photo (slower)
Example assumes you already have a file in MogileFS with the key "cab9f3c712d04de874dafb0af0a0bf03e303e6e0".
routing.yml:
mfs:
url: /i/:mkey
param: { module: file, action: showMogileFSFile }
/apps/<app>/modules/file/actions/showMogileFSFileAction.class.php:
class showMogileFSFileAction extends sfAction
{
public function execute()
{
// remove file's extension
$mogileFSKey = preg_replace('@\.[\w\d]+$@', '', $this->getRequestParameter('mkey'));
// get mogilefs key from photo table
$c = new Criteria();
$c->add(PhotoPeer::MOGILEFS_KEY, $mogileFSKey);
$photo = PhotoPeer::doSelectOne($c);
// load path from mogilefs
$sfMogileFSRemoteFile = sfMogileFS::loadFile($mogileFSKey);
// call perlbal's x-reproxy-url and set mime type
$this->getResponse()->setHttpHeader('Content-type', $photo->getFile()->getMimeType(), true);
$this->getResponse()->setHttpHeader('X-REPROXY-URL', $sfMogileFSRemoteFile->getPath(), true);
return sfView::HEADER_ONLY;
}
}
layout.php (or any template):
<?php use_helper('MogileFSAsset') ?>
<?php echo mfs_image_tag('cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg') ?>
This results in the following HTML rendering:
<img src="/index.php/i/cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg" />
Example #2 Displaying a Photo (faster)
Instead of calling the entire Symfony Framework to output each of our MogileFS files, we can make the calls more lightweight by using a separate php script and can maintain our pretty url's with mod_rewrite.
# apache.conf file # /i/fjDFJKEJKjfdjadsdkjfds798234432.jpg => /mogilefs.php?mkey=fjDFJKEJKjfdjadsdkjfds798234432 RewriteRule ^i/([A-Za-z0-9]+)[.A-Za-z0-9]*/?$ mogilefs.php?mkey=$1 [L]
//
// example mogilefs.php, to be placed in your web/ folder
//
require_once('/my/path/to/symfony/lib/config/sfConfig.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSBaseFile.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSRemoteFile.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFSConnection.class.php');
require_once('../plugins/sfMogileFSPlugin/lib/sfMogileFS.class.php');
sfConfig::set('app_sfMogileFSPlugin_trackers', array('127.0.0.1:7001'));
sfConfig::set('app_sfMogileFSPlugin_domain', 'my_domain');
$mogileFSKey = preg_replace('@\.[\w\d]+$@', '', $_GET['mkey']);
$link = mysql_connect('127.0.0.1', 'user', 'pass');
mysql_select_db('mydb', $link);
$q = sprintf("select files.mime_type, photos.mogilefs_key from files
left join photos on (files.id = photos.file_id)
where photos.mogilefs_key='%s'", mysql_real_escape_string($mogileFSKey)
);
$row = mysql_fetch_row(mysql_query($q));
mysql_close($link);
$sfMogileFSRemoteFile = sfMogileFS::loadFile($row[1]);
header("Content-Type: {$row[0]}");
header("X-REPROXY-URL: {$sfMogileFSRemoteFile->getPath()}");
Then in your template file just call the image like normal, i.e.
<img src="/i/cab9f3c712d04de874dafb0af0a0bf03e303e6e0.jpg" />
Benchmarking Info
Simple ab tests reveal the following information (note the actual numbers are NOT what's important) about fetching a single 1544x1024 200KB JPEG image from MogileFS under various scenarios:
All else being constant...
- No mod_rewrite. Request goes through Symfony: ~55ms
- mod_rewrite. Request does NOT go through Symfony: ~12ms
- mod_rewrite, memcached (skipping a db select): ~12ms
- mod_rewrite, memcached (skipping a db select, and skipping MogileFS lookups, though image still fetched from tracker and hard drive): ~9ms
- Image fetched directly from memcached (just out of curiousity - coming soon): ~?
MogileFS FAQ
Questions and answers have been gleaned from personal experience as well as the MogileFS Mailing List.
How can I maximize browser download speed?
- Setup 2-3 hosts/aliases (img1.mydomain.com, img2..., etc.) to serve images from so browsers will download them concurrently. Saunders' new book talks about this technique (listed in Optimization).
How can I prevent the database from performing the same reads over and over again as the same files are being requested over and over?
- Add a memcached layer for query caching.
What other caching methods are available?
- Perlbal can cache but it's best used for its reproxying ability. Squid is often run on top of Perlbal for caching.
Links
MogileFS Summit [pdf] (9/19/2006)
Todo
- Graphical representation of how a typical request works
- MogileFS administration via backend
Changelog
trunk
- bmeynell: updated README
2007-09-22 | 0.6.0 Beta
- bmeynell: Initial release
2007-07-19 | 0.6.0 Alpha
- bmeynell: Initial release
Active tickets
Attachments
- sfMogileFSPlugin-0.6.0.tgz (8.5 kB) - added by Benjamin.Meynell on 07/20/07 03:53:36.