Tropical Software Observations

26 January 2010

Posted by Kamal Fariz

at 6:16 PM

1 comments

Labels: , , , ,

Full-text search in Rails with Sunspot

There comes a time in every app when doing a “SQL LIKE” query just doesn’t cut it. I’m going to show you how easy it is to add proper full-text search to your Rails app using the Sunspot::Rails plugin.

Sunspot

Sunspot is a standalone Ruby library that makes integrating with a Solr search engine a cinch. It wraps all the nitty gritty of indexing and querying in a declarative DSL which you can use to expose virtually any Ruby object to be searched, not just ActiveRecord models. The sunspot gem bundles a standalone Solr search engine (mostly stock, served by Jetty, although also contains support for geolocational ordering).

Sunspot::Rails is Rails plugin which is basically Sunspot the library plus some hooks into ActiveRecord to update indexes on creates and updates as well as the Rails request lifecycle commit the index at the end of every request. It adds the DSL as class methods into ActiveRecord to allow you to configure the index much like in the style of configuring association or named_scopes. The gem also bundles a set of rake tasks to manage starting, stopping and restarting the Solr service.

Installation

  1. Install the gem
    $ gem install sunspot_rails
  2. Edit your config/environment.rb to include
    config.gem 'sunspot', :lib => 'sunspot'
    config.gem 'sunspot_rails', :lib => 'sunspot/rails'
  3. Generate the sunspot configuration file in config/sunspot.yml
    $ ./script/generate sunspot
  4. Run the Solr service
    $ rake sunspot:solr:start
    (if Rake complains that it couldn’t find this task, add require 'sunspot/rails/tasks' to the top of your Rakefile).

Defining an Index

The first thing you need to do before anything can be searched is creating an index. There is two parts to this.

The first part is defining an index. For Rails models, you can define it using the searchable class method. Suppose we have an Article that belongs to an Author.
class Article < ActiveRecord::Base
belongs_to :author

searchable do
text :title, :boost => 2.0
text :body
text :author_name do
author.name
end
time :updated_at # for sorting by recent
string :sort_title do # for sorting by title, ignoring leading A/An/The
title.downcase.gsub(/^(an?|the)/, '')
end
boolean :published, :using => :published?
end

def published?
state == :published
end
end
Sunspot supports text, string, time, boolean, integer and float fields. When planning what to index, note that only text fields are exposed as full-text search while the other field types are used for restricting, sorting and faceting.

What I like about the DSL is the flexibility. You can directly index an ActiveRecord attribute (:title, :body) or virtual attributes by giving it a block (:sort_title) or a symbol to a method (:published). Even indexing associations is really a matter of calling methods on it.

The second part is indexing. Sunspot provides a utility method to reindex all records for a particular class. In our example, we can call

Article.reindex!

and have the entire Article index rebuilt. For finer grained indexing, you can call Article#index! on a particular instance. As mentioned above, if you are creating and updating models via controllers as in a typical Rails app, this should all be transparent to you.

Querying

Sunspot provides a flexible DSL for querying. A SearchController might look something like this
class SearchController < ApplicationController
def show
@search = search(params)
end

protected
def search(options)
Sunspot.search(Article) do
keywords options[:query]
with(:published, true)
order_by :updated_at, :desc
paginate :page => options[:page]
end
end
end
keywords will be applied to all text fields. The remaining non-text fields can be defined to restrict the query (in the example, we want restrict it to published Articles) and ordering (in the example, we ordered by updated_at). If you don’t define an ordering, the results will be returned sorted by relevance based on occurence and location of the keywords in the document and the index as a whole. You can tweak the relevance score by defining boosts — in this example, Article titles that match the keywords are given a boost over other Articles that may match the keyword elsewhere.

You can define multiple restrictions and they don’t always have to be for equality. It supports restricting by a value being less-than, greater-than, between, any or all (when comparing for an indexed array). The restrictor with(:published, true) is simply a short-hand for with(:published).equal_to(true). You can also test the absense of a value using the without operator.

Finally, Sunspot plays nice with the WillPaginate plugin. In your view, you can paginate easily by doing
<%= will_paginate @search.results %>
and expect it to work seamlessly.

Conclusion

That’s all there is to it to get up and running with Sunspot. My take-home point is Sunspot exposes extremely flexible DSLs that allow you to scale from simple to pretty complicated queries with ease.

If this interested you, you may want to check out the wiki for other features not covered by this article including highlighting of keywords, facets and stored fields.

1 comments:

Unknown said...

This is a good introductory level capability, but if you really want to make solr sing you should check out the Solr Schema -- see http://www.lucidimagination.com/search/document/CDRG_ch04_4.1http://www.lucidimagination.com/search/document/CDRG_ch04_4.1