Recently, I’ve noticed that more and more people seem to be using JSONB columns with PostgreSQL in Rails apps—specifically to create a flexible place to define model attributes in a way that’s cheap and easily modifiable. A sort of pseudo-attribute, created on the fly, without having to create DB migrations or think about back-filling data, etc.
A brief example
To demonstrate the principle, consider an online bookstore with a Book model like this:
# == Schema Information
#
# Table name: books
#
# id :bigint not null, primary key
# metadata :jsonb
# title :string
# created_at :datetime not null
# updated_at :datetime not null
# author_id :bigint not null
# ...
class Book < ApplicationRecord
end
The books table has a column named metadata that is defined as a JSONB type.
Now image that our team is intending to test a new feature: We’ll show a book’s page count on its marketing page, to see if this will have an impact on sales.
We create a background job that can be run for each book, which will count the number of pages in a book and store that value on the book record. To make life easier, we’ll store the page count on the metadata of a book, so we don’t have to worry about adding a new database column for the purpose of adding our new test.
The book’s pages are counted by a class called Book::PageCounter
—the implementation of PageCounter
is irrelevant for our example.
class Book::PageCounterJob < ApplicationJob
def perform(book)
page_count = Book::PageCounter.new(book).count
book.metadata ||= {}
book.metadata[:page_count] = page_count
book.save!
end
end
If you’ve written a job in ActiveJob before, the above code should be fairly simple to follow. We ask the Book::PageCounter
to count the number of pages a Book has, and store that value in a local variable named page_count
. Next, we ensure that the book’s metadata
attribute is defined as a Hash, and then set the :page_count
value of the Hash to the page_count
variable. Finally, we save these changes to the database.
Lastly, to add the book’s page count to its marketing page, we simply need to write something like:
<% if Hash(@book.metadata).fetch(:page_count).present? %>
Page count: <%= @book.metadata[:page_count] %>
<% end %>
This way of persisting and retrieving data on our records, without the need to create a new database migration, update existing records etc., has its benefits when rapidly prototyping, and it’s not difficult to see why some people are attracted to this pattern.
But I believe this pattern should be implemented with some sensible constraints, if not completely avoided. I also believe that these pseudo-attributes should ultimately be considered technical debt that should eventually be productionised.
What‘s the problem with using JSONB columns directly?
When I first came across this pattern, it smelled off to me. And I spent a while thinking about what exactly I dislike about it. Here are some of the concerns that came to mind…
Firstly, this pattern breaks proper encapsulation.
One of the defining concepts in Object-Oriented programming is that classes have data that are accessed through a well defined public interface (attributes). This pattern bypasses the proper encapsulation of data through a public interface, and instead leaks implementation detail about this class outside of the class. In the above example, both our PageCounterJob
object and our view template code know that the pages_count
value being stored is being stored in a Hash
on the Book
object. While this might seem like a minor infraction, this makes all of the code that interacts with our Book
in this way more complex, and likely to change in future.
Secondly, this pattern creates an undocumented property.
When data is stored directly in a JSONB column without clear documentation or an obvious interface, it becomes more challenging for other developers to understand where and how the information is being stored and retrieved. Again, it may be less clear what the purpose of the data being stored is, leading to potential misinterpretation or misuse. This lack of clarity and transparency can create a barrier to understanding the codebase and slow down the development process, as developers may need to spend additional time knowing what data is for, or where it’s used.
Thirdly, JSONB data can have different shapes and is inherently inconsistent.
While one of the advantages of using JSONB is its flexibility, allowing for a variety of data types and structures, this lack of a well defined schema with proper constraints can also lead to inconsistency in the data stored within the column. This inconsistency can be problematic when dealing with large datasets, or data with JSONB schemas that change multiple times over time, because an instance of a given model may have different shaped data depending on when it was created, or what processes have affected it. While this risk also exists with individually defined database columns, the risk is usually mitigated through proper use of constraints and data normalisation.
Lastly, JSONB columns are inherently slower to search.
JSONB columns are slower than more primitive types (integer, etc) and are more complex and costly to index. If you plan on searching for records via these attributes in a reasonably well used application, you’re probably going to run into some performance issues before long.
What’s a better solution?
A better approach is to encapsulate the JSONB data within well-defined setter and getter methods in your model. This ensures that the rest of your application isn’t directly accessing or manipulating the JSONB column, and instead interacts with your records through a properly defined interface.
Consider this change to our previous example:
# == Schema Information
#
# Table name: books
#
# id :bigint not null, primary key
# metadata :jsonb
# title :string
# created_at :datetime not null
# updated_at :datetime not null
# author_id :bigint not null
# ...
class Book < ApplicationRecord
def pages_count
metadata['pages_count']
end
def pages_count=(count)
self.metadata['pages_count'] = count.to_i
end
def pages_count?
pages_count.present?
end
end
class Book::PageCounterJob < ApplicationJob
def perform(book)
page_count = Book::PageCounter.new(book).count
book.update!(page_count: page_count)
end
end
<% if @book.pages_count? %>
Page count: <%= @book.pages_count %>
<% end %>
In this example, we’ve created #pages_count
and #pages_count=
methods that respectively get and set the pages_count
attribute within the metadata
JSONB column.
Instead of accessing the metadata hash directly, we can use book.pages_count
and book.pages_count=
to interact with the pages_count
data, without any other classes knowing or caring that it’s stored within a JSONB column.
The #pages_count?
method provides a way for consumers of our @book
to ask about its page count, without having to infer that a Hash is being used behind the scenes.
This encapsulation not only makes our code cleaner and easier to read, but also helps in maintaining the integrity of your data. In the setter method, we coerce the value of pages count to be an Integer, which should ensure that values are always being stored and retrieved as the type we expect them to be.
By providing a clear interface to interact with the data stored in JSONB columns, we now can control how data is read and written, enabling us to change either of these behaviours in future without having to change all of the areas of the code that consume our Book class.
The Rails Way solution
Thankfully, Rails already provides this behaviour out of this box with its ActiveRecord::Store API. Let’s have a look at out previous example, and see how we might improve it using Rails’s Store API.
# == Schema Information
#
# Table name: books
#
# id :bigint not null, primary key
# metadata :jsonb
# title :string
# created_at :datetime not null
# updated_at :datetime not null
# author_id :bigint not null
# ...
class Book < ApplicationRecord
store_accessor :metadata, :pages_count
end
By using Rails’s .store_accessor
method, we can define the getter and setter method for pages_count
, and Rails will store it on our metadata
column as a hash (serialised as JSON), without us having to define how that works.
This method also provides dirty tracking methods (e.g. #pages_count_changed?
) by default, which helps us to define a richer and more introspective API, at no extra cost. In this way, our attributes behave more like the attributes that are mapped to their own database column.
This solution isn’t exactly the same as the methods we rolled ourselves though, as it doesn’t provide a presence predicate method #pages_count?
and it doesn’t coerce the value of the pages_count
to an integer when set—meaning we could set this attribute as a String or another type, and the code wouldn’t protect against that.
We can add this missing functionality with the following changes:
# == Schema Information
#
# Table name: books
#
# id :bigint not null, primary key
# metadata :jsonb
# title :string
# created_at :datetime not null
# updated_at :datetime not null
# author_id :bigint not null
# ...
class Book < ApplicationRecord
store_accessor :metadata, :pages_count
def pages_count=(value)
super(value.to_i)
end
def pages_count?
pages_count.present?
end
end
In the above block, we have updated the #pages_count=
method to provide the Integer coercion, and then called super
to continue the method execution as provided by store_accessor
.
We’ve also added the missing #pages_count?
method, to give us a more introspective way of checking whether the value of pages count has been set yet.
A third party solution
If this is a pattern you plan on using often in your Rails application, I would also recommend checking out the jsonb_accessor Ruby gem, which provides a much richer set of features than Rails’s ActiveRecord::Store API.
In summary
Honestly, I’m not a huge fan of using JSONB columns in this way. Adding new database columns to tables in Rails is already very quick and easy to do, so the productivity benefits of being able to conjure a pseudo-attribute at will hardly seems worth it. If you decide that you still like this pattern, then by all means fill your boots! But my advice would be to take a few minutes to define proper getter and setter methods, rather than accessing the attribute hash directly.
Happy coding!