Using JSONB columns to create pseudo-attributes in Rails

Recently, I’ve noticed that more and more people seem to be using JSONB columns with PostgreSQL in Rails apps—specifically to create a flexible place to define model attributes in a way that’s cheap and easily modifiable. A sort of pseudo-attribute, created on the fly, without having to create DB migrations or think about back-filling data, etc.

A brief example

To demonstrate the principle, consider an online bookstore with a Book model like this:

# == Schema Information
#
# Table name: books
#
#  id         :bigint           not null, primary key
#  metadata   :jsonb
#  title      :string
#  created_at :datetime         not null
#  updated_at :datetime         not null
#  author_id  :bigint           not null
#  ...
class Book < ApplicationRecord
end

The books table has a column named metadata that is defined as a JSONB type.

Now image that our team is intending to test a new feature: We’ll show a book’s page count on its marketing page, to see if this will have an impact on sales.

We create a background job that can be run for each book, which will count the number of pages in a book and store that value on the book record. To make life easier, we’ll store the page count on the metadata of a book, so we don’t have to worry about adding a new database column for the purpose of adding our new test.

The book’s pages are counted by a class called Book::PageCounter—the implementation of PageCounter is irrelevant for our example.

class Book::PageCounterJob < ApplicationJob
  def perform(book)
    page_count = Book::PageCounter.new(book).count

    book.metadata ||= {}
    book.metadata[:page_count] = page_count
    book.save!
  end
end

If you’ve written a job in ActiveJob before, the above code should be fairly simple to follow. We ask the Book::PageCounter to count the number of pages a Book has, and store that value in a local variable named page_count. Next, we ensure that the book’s metadata attribute is defined as a Hash, and then set the :page_count value of the Hash to the page_count variable. Finally, we save these changes to the database.

Lastly, to add the book’s page count to its marketing page, we simply need to write something like:

<% if Hash(@book.metadata).fetch(:page_count).present? %>
  Page count: <%= @book.metadata[:page_count] %> 
<% end %>

This way of persisting and retrieving data on our records, without the need to create a new database migration, update existing records etc., has its benefits when rapidly prototyping, and it’s not difficult to see why some people are attracted to this pattern.

But I believe this pattern should be implemented with some sensible constraints, if not completely avoided. I also believe that these pseudo-attributes should ultimately be considered technical debt that should eventually be productionised.

What‘s the problem with using JSONB columns directly?

When I first came across this pattern, it smelled off to me. And I spent a while thinking about what exactly I dislike about it. Here are some of the concerns that came to mind…

Firstly, this pattern breaks proper encapsulation.

One of the defining concepts in Object-Oriented programming is that classes have data that are accessed through a well defined public interface (attributes). This pattern bypasses the proper encapsulation of data through a public interface, and instead leaks implementation detail about this class outside of the class. In the above example, both our PageCounterJob object and our view template code know that the pages_count value being stored is being stored in a Hash on the Book object. While this might seem like a minor infraction, this makes all of the code that interacts with our Book in this way more complex, and likely to change in future.

Secondly, this pattern creates an undocumented property.

When data is stored directly in a JSONB column without clear documentation or an obvious interface, it becomes more challenging for other developers to understand where and how the information is being stored and retrieved. Again, it may be less clear what the purpose of the data being stored is, leading to potential misinterpretation or misuse. This lack of clarity and transparency can create a barrier to understanding the codebase and slow down the development process, as developers may need to spend additional time knowing what data is for, or where it’s used.

Thirdly, JSONB data can have different shapes and is inherently inconsistent.

While one of the advantages of using JSONB is its flexibility, allowing for a variety of data types and structures, this lack of a well defined schema with proper constraints can also lead to inconsistency in the data stored within the column. This inconsistency can be problematic when dealing with large datasets, or data with JSONB schemas that change multiple times over time, because an instance of a given model may have different shaped data depending on when it was created, or what processes have affected it. While this risk also exists with individually defined database columns, the risk is usually mitigated through proper use of constraints and data normalisation.

JSONB columns are slower than more primitive types (integer, etc) and are more complex and costly to index. If you plan on searching for records via these attributes in a reasonably well used application, you’re probably going to run into some performance issues before long.

What’s a better solution?

A better approach is to encapsulate the JSONB data within well-defined setter and getter methods in your model. This ensures that the rest of your application isn’t directly accessing or manipulating the JSONB column, and instead interacts with your records through a properly defined interface.

Consider this change to our previous example:

# == Schema Information
#
# Table name: books
#
#  id         :bigint           not null, primary key
#  metadata   :jsonb
#  title      :string
#  created_at :datetime         not null
#  updated_at :datetime         not null
#  author_id  :bigint           not null
#  ...
class Book < ApplicationRecord
  def pages_count
    metadata['pages_count']
  end

  def pages_count=(count)
    self.metadata['pages_count'] = count.to_i
  end
  
  def pages_count?
    pages_count.present?
  end
end
class Book::PageCounterJob < ApplicationJob
  def perform(book)
    page_count = Book::PageCounter.new(book).count

    book.update!(page_count: page_count)
  end
end
<% if @book.pages_count? %>
  Page count: <%= @book.pages_count %> 
<% end %>

In this example, we’ve created #pages_count and #pages_count= methods that respectively get and set the pages_count attribute within the metadata JSONB column.

Instead of accessing the metadata hash directly, we can use book.pages_count and book.pages_count= to interact with the pages_count data, without any other classes knowing or caring that it’s stored within a JSONB column.

The #pages_count? method provides a way for consumers of our @book to ask about its page count, without having to infer that a Hash is being used behind the scenes.

This encapsulation not only makes our code cleaner and easier to read, but also helps in maintaining the integrity of your data. In the setter method, we coerce the value of pages count to be an Integer, which should ensure that values are always being stored and retrieved as the type we expect them to be.

By providing a clear interface to interact with the data stored in JSONB columns, we now can control how data is read and written, enabling us to change either of these behaviours in future without having to change all of the areas of the code that consume our Book class.

The Rails Way solution

Thankfully, Rails already provides this behaviour out of this box with its ActiveRecord::Store API. Let’s have a look at out previous example, and see how we might improve it using Rails’s Store API.

# == Schema Information
#
# Table name: books
#
#  id         :bigint           not null, primary key
#  metadata   :jsonb
#  title      :string
#  created_at :datetime         not null
#  updated_at :datetime         not null
#  author_id  :bigint           not null
#  ...
class Book < ApplicationRecord
  store_accessor :metadata, :pages_count
end

By using Rails’s .store_accessor method, we can define the getter and setter method for pages_count, and Rails will store it on our metadata column as a hash (serialised as JSON), without us having to define how that works.

This method also provides dirty tracking methods (e.g. #pages_count_changed? ) by default, which helps us to define a richer and more introspective API, at no extra cost. In this way, our attributes behave more like the attributes that are mapped to their own database column.

This solution isn’t exactly the same as the methods we rolled ourselves though, as it doesn’t provide a presence predicate method #pages_count? and it doesn’t coerce the value of the pages_count to an integer when set—meaning we could set this attribute as a String or another type, and the code wouldn’t protect against that.

We can add this missing functionality with the following changes:

# == Schema Information
#
# Table name: books
#
#  id         :bigint           not null, primary key
#  metadata   :jsonb
#  title      :string
#  created_at :datetime         not null
#  updated_at :datetime         not null
#  author_id  :bigint           not null
#  ...
class Book < ApplicationRecord
  store_accessor :metadata, :pages_count
  
  def pages_count=(value)
    super(value.to_i)
  end
  
  def pages_count?
    pages_count.present?
  end
end

In the above block, we have updated the #pages_count= method to provide the Integer coercion, and then called super to continue the method execution as provided by store_accessor.

We’ve also added the missing #pages_count? method, to give us a more introspective way of checking whether the value of pages count has been set yet.

A third party solution

If this is a pattern you plan on using often in your Rails application, I would also recommend checking out the jsonb_accessor Ruby gem, which provides a much richer set of features than Rails’s ActiveRecord::Store API.

In summary

Honestly, I’m not a huge fan of using JSONB columns in this way. Adding new database columns to tables in Rails is already very quick and easy to do, so the productivity benefits of being able to conjure a pseudo-attribute at will hardly seems worth it. If you decide that you still like this pattern, then by all means fill your boots! But my advice would be to take a few minutes to define proper getter and setter methods, rather than accessing the attribute hash directly.

Happy coding!

Written by

Photo of Gavin Morrice
Gavin Morrice

Software engineer based in Scotland

Connect with me