ActiveRecord to Repository: Changing Persistence Patterns with the Elasticsearch Rails Gem
One of the Elasticsearch Rails integration gems provides a persistence layer for Ruby domain objects. Up through the 5.x series of this gem, elasticsearch-persistence, users could choose between the ActiveRecord and Repository patterns. With elasticsearch-persistence 6.0, the ActiveRecord pattern has been deprecated and removed. We realize that this means some of our users will have to invest additional time migrating their applications, but we are convinced it will pay off in the long-run.
If you have an existing app using the gem's ActiveRecord pattern and want to upgrade to 6.0, this post is for you. If you are starting a new app with the Repository pattern, you may also find this guide useful.
Reason for Deprecation
While the ActiveRecord pattern of the elasticsearch-persistence gem was attractive for easily transitioning from a Rails app backed by a relational database to one using Elasticsearch, it introduces technical and conceptual difficulties with time. Rails was originally written to be used with a relational database and its semantics are at odds with an inherently non-relational storage option like Elasticsearch. The GitHub issues we've seen opened over the years for the elasticsearch-persistence gem are largely due to a difference between what ActiveRecord semantics promise and what a non-relational storage option like Elasticsearch can provide.
For this reason, we encourage our users to decouple domain objects from the persistence layer access code with the Repository pattern. With this pattern, users can define their Ruby objects as models and keep persistence code and rich Elasticsearch queries in a separate, repository class. The features of Elasticsearch go well beyond what a relational database provides and can be used easily in a Repository class. Using the Repository pattern frees up your code to make the most of Elasticsearch without the structure and schema constraints of an ActiveRecord model definition.
Example App: Music
The 5.x branch of the elasticsearch-persistence repo provided a Rails template for an app demonstrating use of the ActiveRecord pattern. The app has the models, Artist
and Album
and they were persisted in the same index using the join datatype. While it's completely valid to employ this data type, the example migration is easier to follow if the app persists artists and albums in separate indices. Please see the end of this article for a few notes on the join data type.
We updated the base app to persist artist and albums in separate indices with the association: an artist can have many albums and an album belongs to a single artist. Then we migrated the app to use the Repository pattern in a series of commits. This app is demonstrative in nature and is far from feature-rich, but the goal of this guide is to illustrate and document the changes necessary to migrate an app from one using the ActiveRecord pattern to one using the Repository pattern.
In the guide below you will find:
- Explanations organized into various components of the Rails app
- Checklists serving as a reference for your migration
- In-depth explanations for significant changes
- Code snippets taken from the reference commits
References:
- elasticsearch-persistence gem
- music app using ActiveRecord pattern (starting point for the migration)
- Migrated music app using Repository pattern (result of the migration)
Repository Classes
Checklist:
- Include
Elasticsearch::Persistence::Repository
- Include
Elasticsearch::Persistence::Repository::DSL
(if class-level configuration is needed) - Define
document_type
,index_name
,klass
on an instance or at the class-level with the DSL module. We recommend using the defaultdocument_type
'_doc
'.* - Define mappings on a repository instance or at the class-level with the DSL module
- Define a
#deserialize
method for handling raw hashes returned from Elasticsearch queries. If the index contains documents corresponding to multiple model types, handle instantiation routing in this method - Define methods for running custom, frequently-used queries
- Define explicit
#save
methods if certain options (e.g. routing) need to be used - Write tests
- The ability to define a document type is deprecated and will be removed in future versions of Elasticsearch.
Naming
class ArtistRepository
end
Give your Repository class a name that associates it with the models it will be responsible for querying, serializing, and deserializing. Typically, there is a 1:1 mapping between a repository class and an Elasticsearch index. If each of your models is persisted in different indices, you would have one repository class for each model.
Include mixin(s)
class ArtistRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
end
Include the Elasticsearch::Persistence::Repository
module. This will enrich the class with methods and provide access to a client, used to make requests. If you'd like to set configurations for the repository at the class-level, include the Elasticsearch::Persistence::Repository::DSL
module. All instances of the repository class will then use the class-level configurations as a default. The settings can always be overridden at instantiation via an options argument.
Document Type, Index Name and klass
class ArtistRepository
...
index_name 'artists'
klass Artist
...
end
Define the document_type
. With Elasticsearch 6.x, multiple types in a single index are no longer supported, and in future versions of Elasticsearch, the API will default to a document "type" of '_doc
'. Therefore, for forward compatibility, the document_type
of the objects persisted via the ArtistRepository
is left as the default, '_doc
', and you might want to do the same. Also, define the index name and the class (klass
) that should be used to instantiate a new object from a document returned from Elasticsearch.
Mappings
class ArtistRepository
...
mapping do
indexes :name, analyzed_and_raw
indexes :members, analyzed_and_raw
indexes :profile
indexes :members_combined, { analyzer: 'snowball' }
indexes :artist_suggest, {
type: 'object',
properties: {
name: { type: 'completion' },
members: { type: 'completion' }
}
}
end
...
end
Define mappings for the ArtistRepository
and AlbumRepository
. These mappings will be applied when #create_index!
is called on the repository. You may want to define a rake task for creating each of your application's indices with their respective mappings.
Deserialization
Deserializations - Example Code
class ArtistRepository
...
def deserialize(document)
artist = super
artist.id = document['_id']
artist
end
...
end
Define a #deserialize
method on the ArtistRepository
and AlbumRepository
. Note that we must set the id field on the instantiated objects so that the id attribute is properly accessible for each model object.
Query Definitions
Query Definitions - Example Code
class ArtistRepository
...
def all(options = {})
search({ query: { match_all: { } } },
{ sort: 'name.raw' }.merge(options))
end
...
end
Next, define queries that are common in your application. For example, we define #all
for both the ArtistRepository
and AlbumRepository
.
Custom Queries
class AlbumRepository
...
def albums_by_artist(artist)
search(query: { match: { artist_id: artist.id } })
end
def album_count_by_artist(artist)
count(query: { match: { artist_id: artist.id } })
end
...
end
We know that our application will also need to execute some custom queries repeatedly so we will make them methods on the repository classes. We would otherwise need to execute them via the #search
method. For example, we'll need to retrieve the album documents given a particular artist, and retrieve the count of albums, given a particular artist. We can define these methods on the AlbumRepository
class. We'll also define the body used for a suggest request for the AlbumRepository
and ArtistRepository
.
Tests
Repository Tests - Example Code
It's important we ensure that our persistence and search methods are working properly, so we'll add tests for each of the repositories. We'll test that the correct mapping is used to create an index, that artists and albums are correctly serialized and deserialized, and that our custom queries execute as expected.
Models: Artist model, Album model
Checklist:
- Remove
Elasticsearch::Persistence::Model
module - Include
Elasticsearch::Model
(if necessary) - Include ActiveModel modules, if needed, e.g.:
a.
ActiveModel::Naming
b.ActiveModel::Model
c.ActiveModel::Validations
- Define Validations, either custom
Validators
or simple validations available viaActiveModel::Validations
- Explicitly define associations as attributes
- Define defaults for attributes
- Add id as an explicit attribute
- Define
#to_hash
a. If custom logic is needed b. If there are associations whose id we need to delete from the persisted representation of the document - Define
#persisted?
method, if necessary. Form helpers sometimes rely on it - Update tests
Artist Model
class Artist
include ActiveModel::Model
include ActiveModel::Validations
end
We'll still have an Artist model defined in our app, but it's no longer persisted via methods on the instances themselves with the ActiveRecord pattern. We'll include some other mixins to maintain certain functionalities, like validations. In the reference commit, we remove the Elasticsearch::Persistence::Model
module and include the following modules:
ActiveModel::Model
: supplies some methods necessary for form helpersActiveModel::Validations
: provides error message caching and validation methods
Mappings
Artist Mappings - Example Code
Remove the mapping option passed to the attribute methods, as this was used by Elasticsearch::Persistence::Model
to construct the mapping document sent when creating an index. Instead, define the #mapping
on the ArtistRepository
using the Elasticsearch::Persistence::Repository::DSL
module. Note that this is used by the ArtistRepository
when #create_index!
is called.
Validations
Artist Validations - Example Code
class Artist
...
validates :name, presence: true
...
end
class ArtistRepository
...
def serialize(artist)
artist.validate!
artist.to_hash.tap do |hash|
suggest = { name: { input: [ hash[:name] ] } }
if hash[:members].present?
suggest[:members] = { input: hash[:members].collect(&:strip) }
end
hash.merge!(:artist_suggest => suggest)
end
end
...
end
We can still define validations on the Artist model if we include ActiveModel::Validations
. However #validate!
must be called explicitly at persistence time. Because the repository is responsible for persisting the objects, we called #to_hash
on the artist objects in the ArtistRepository #serialize
method and put the call to #validate!
there.
Separation of Domain Object Logic and Persistence Logic
Clean up the models by removing methods that are now called on the repositories instead. These methods are those that make requests to Elasticsearch, as we want to keep all interactions with the persistence layer in the repository classes. Also remove anything relating to index configuration or creation in the models.
Custom Methods
Artist Custom Methods - Example Code
class Artist
...
def persisted?
!!id
end
...
end
Some view helpers rely on a #persisted?
method on the model object being available, so define one explicitly.
Tests
Album Model
The Album model gets the same makeover as the Artist model. The first step is to remove the Elasticsearch::Persistence::Model
mixin and include other necessary mixins. See the first step in the Artist model migration for details.
Mapping
Define a mapping for the Album model using Elasticsearch::Model::DSL
on the AlbumRepository
and remove methods related to defining and creating an index.
Associations and Validations
Album Associations and Validations - Example Code
class Album
class Validator < ActiveModel::Validator
ERROR_MESSAGE = 'An album must be associated with an artist.'.freeze
def validate(album)
unless album.title && album.artist && album.artist.persisted?
album.errors.add(:base, ERROR_MESSAGE)
end
end
end
end
We want to require that an album object be associated with an artist before it is persisted. Doing so requires slightly complex logic so we extract the code into a custom Validator
. As we did with the Artist model, call #validate!
explicitly in the #serialize
method on the album repository.
Separation of Domain Object Logic and Persistence logic
Remove methods relating to persistence that should be called on the AlbumRepository
instance instead.
Id Attribute
The id attribute is not automatically assumed or handled for either the artist or album model, so we add an explicit id attribute.
Tests
Suggester
Checklist:
- Change all methods called on model objects relating to persistence and search to use the repository object instead
class Suggester
...
def execute!(*repositories)
@responses ||= []
repositories.each do |repository|
@responses << begin
repository.client.search(index: repository.index_name,
body: repository.suggest_body(@term))
end
end
end
...
end
Our suggester object should use a repository when doing custom queries instead of the Artist model.
Tests
Suggester Tests - Example Code
We also update the Suggester tests to use a repository instead.
Rails Initializer
Checklist:
- Decide where to define the repository object a. In a controller? b. In an initializer?
Rails Initializer - Example Code
There are a number of ways to define the repository object(s) used in the app. They can be set it up in the initializer as constants or global variables. Alternatively, they can be created as instance variables in each controller when requests are handled. You should consider how expensive it is to instantiate repository objects when choosing which method to use.
Controllers: Artists controller, Albums controller
Checklist:
- Change all methods called on model objects relating to persistence to use the repository object instead
- Extract the id from the response returned after calling
#save
on the repository. Set it on the newly-persisted model object
class ArtistsController < ApplicationController
...
def index
@artists = $artist_repository.all(sort: 'name.raw')
end
...
end
The controllers are updated to use the repositories. When calling #save
on the model object with the ActiveRecord pattern, the document with its new id was returned as a whole entity. With the repository object, only the id of the indexed document is returned, so ensure that this is handled appropriately. For example, we assign the new id to the newly persisted artist object so that it's considered persisted by the form helper.
Alternatively, you can define a custom #save
method on the repository class that sets the id.
URLs and Routes
No updates needed!
Views
Checklist:
- Update any queries using model objects to use a repository instead
class ArtistsController < ApplicationController
...
def show
@albums = $album_repository.albums_by_artist(@artist)
end
...
end
Where we once relied on model methods to access associations, we'll now change the code to use the repository to do object retrieval. For example, we use our custom #albums_by_artist
method on the album repository to retrieve the albums by a given artist. We must make other changes in the artists and search views to use the repository instead of the object instances.
Important Notes
Defining Attributes on POROs (Plain Old Ruby Objects)
There are a number of gems that allow us to define attributes with types on POROs — essentially allowing us to define a schema for Ruby objects directly in their model files. We can choose one of these libraries and use it, or we can handle our attributes explicitly with custom code. The base app using the ActiveRecord pattern was written at a time when Vitrus was a popular gem for defining attributes on Ruby objects. Since then, other gems have gained popularity in the community.
In order to demonstrate that we don't necessarily need to depend on another gem to attain attribute functionality, we handle the model attributes explicitly. You are still free to use whatever gem you'd like to enrich your PORO.
The Elasticsearch Join Datatype
The join datatype is a special field that creates parent/child relations within documents in the same index. Using the join datatype with the has_parent
query, or the has_child
query adds significant overhead to query performance. That said, the join datatype can be used as a more complex schema. It is required to index the lineage of a parent in the same shard as a child so we must always route child documents using their parent id. If you persist parent and child documents in a single index, ensure that the routing values are taken into account when both saving and retrieving child documents.
Wrapping Up / Reaching Out
We hope that this guide has been useful for you whether you are starting a new app using the Repository pattern or migrating an existing one from the ActiveRecord pattern. If you have any questions, don't hesitate to reach out to us via an issue in the elasticsearch-rails repo.