Sortable Semantic Version Strings in Rails (part 2)

This post is part 2 of 3 describing how Semantic Version strings can be converted to sortable integers for easier and faster sorting.

  • Part 1 explains the concept and the constraints of this method
  • Part 2 describes how to implement this in ActiveRecord
  • Part 3 (TODO) describes an alternative approach that is SQL driven

Implementing this in ActiveRecord

In the previous post, we walked through the concept of how we can store a SemVer as a 32-bit integer so that we can properly sort records in the correct order precedence for SemVer. Now, let’s have a look at how we could add this to a hypothetical ActiveRecord model.

An example model

Consider a model called UserDevice that tracks the existence of a mobile device for a user. Since a user might be logged in on more than one device, we can say that a user has many user devices. This model and table records the user, the device operating system, and the latest known app version on that device:

# == Schema Information
#
# Table name: user_devices
#
#  id                          :bigint           not null, primary key
#  user_id                     :bigint           not null
#  operating_system            :text             not null
#  latest_app_version          :string           not null
#  created_at                  :datetime         not null
#  updated_at                  :datetime         not null
#
class UserDevice < ApplicationRecord
  # ... 
  
  scope :ordered_by_latest_app_version, 
          -> { order(:latest_app_version) }
  
  # ...
end

This version of our model currently suffers from the bug mentioned in the part 1 post. Namely, that records will appear in the wrong order when sorted by latest_app_version (i.e. 1.10.0 would come before 1.9.0).

To fix this, we need a way of storing our latest_app_version SemVer string into a sortable integer representation.

We probably also want to see a readable version of the SemVer string that makes sense, so we should store our sortable column as an additional column on our table, and leave the current (more readable) latest_app_version in place

We can add this with a simple database migration:

$ rails g migration add_latest_app_version_bitpack_to_user_devices latest_app_version_bitpack:integer

This should create a migration that looks something like:

class AddLatestAppVersionBitpackToUserDevices < ActiveRecord::Migration[8.0]
  def change
    add_column :user_devices, 
               :latest_app_version_bitpack, 
               :integer
  end
end

Let’s run that migration and update the table to include our new column!

$ rails db:migrate

Now we can see in the schema information comments above our model, that the latest_app_version_bitpack has been added and is defined as an integer

# == Schema Information
#
# Table name: user_devices
#
#  id                          :bigint           not null, primary key
#  user_id                     :bigint           not null
#  operating_system            :text             not null
#  latest_app_version          :string           not null
#  latest_app_version_bitpack  :integer
#  created_at                  :datetime         not null
#  updated_at                  :datetime         not null
#
class UserDevice < ApplicationRecord
# ...

scope :ordered_by_latest_app_version, 
        -> { order(:latest_app_version) }

# ...
end

This column won’t be usable yet, though, because we haven’t populated it with any data. To do so, we’ll have to find a way to create our BitPack representations of a SemVer string that we designed in Part 1. Let’s do that now…

Bitwise operators in Ruby

Ruby supports standard bitwise operators, with the following operator methods:

8 << 2 # => 32  (Left-shift)
8 >> 2 # => 2   (Right-shift)
8 | 2  # => 10  (Or)
8 & 2  # => 0   (And)
8 ^ 2  # => 0   (Xor)
~8     # => -9  (Not)

Luckily, we’re only interested in the two of these for this solution: the << and | operators.

Shift operators

In Part 1, we were able to assign portions of our long 32-bit binary value, so that sub-sections of that number meant something to us. This allows us to encode multiple values in the same 32-bit binary representation. But how exactly do we assign portions of a binary number like this? That’s exactly what left-shift and right-shift operators do.

Let’s look at a really simple example. Here’s the 8-bit binary representation for the number 1:

0000 0001

But what if we were to shift those numbers all one place to the left? So that it looks like this:

0000 0010

This is exactly what the left-shift operator does. We can try this on the number 1 in Ruby to see the output:

value = (1 << 1) # shift the number 1 one position to the left
sprintf("%08b", value) # Print the 8-bit binary value for this number
# => "00000010"

Simply put, left-shift will take the binary representation of an Integer, and shift all of the numbers one position to the left.

Incidentally, right-shift takes the binary representation of an Integer, and shifts all of the numbers one position to the right.

The OR operator

We’ve looked at how the shift operators move bits around, but how do we actually combine those bits into one integer? That’s where the bitwise OR operator (|) comes in. It compares each bit of its operands and sets each bit in the result to 1 if that bit is 1 in either operand.

For instance, lets see what the or operator looks like when we use it with these two binary values 00000110 and 00000011:

binary_value = 0b00000110 | 0b00000011
sprintf("%08b", binary_value)
# => "00000111"

All of the bits that are 0 in both numbers are still 0 in the resulting value. But if any of the bits are 1 in either of the numbers, then they are also 1 in the resulting value. This is similar to a boolean OR that we use in Ruby, since the value is true if either operand is true—just applied to each bit in the numbers.

This is a convenient way to concatenate two binary values together. Because each component is shifted into its own region of the 32-bit integer, there is no overlap—making the OR operator an ideal way to combine multiple shifted values into one sortable integer.

Encoding our SemVer string

Now that we understand what the left-shift operator does, we have everything we need to encode a SemVer string into a 32-bit integer in Ruby.

Let’s see how that would look like in code—don’t worry, we’ll go through it bit by bit (pardon the pun):

class SemanticVersion
  SEMVER_STRING_REGEXP = /^
    (?<major>\d+)\.                 # Major version number
    (?<minor>\d+)\.                 # Minor version number
    (?<patch>\d+)                   # Patch version number
    (-?)                            # Optional pre-release separator
    (?<release_type>alpha|beta|rc)? # Optional pre-release type
    (\.?)
    (?<build>\d*)?                  # Optional pre-release version number
    (?<metadata>.*)?                # Optional metadata (ignored)
  $/ix

  PACK_SHIFTS = {
    major: 25,
    minor: 15,
    patch: 5,
    release_type: 3,
    build: 0
  }

  RELEASE_TYPES = %w[alpha beta rc release]
  DEFAULT_RELEASE_TYPE = "release"

  attr_reader :version_string

  attr_reader :major, :minor, :patch, :release_type, :build
  private :major, :minor, :patch, :release_type, :build

  def initialize(version_string)
    version_string_match = version_string.to_s.match(SEMVER_STRING_REGEXP)

    fail "Invalid Semantic Version string: #{version_string}" if version_string_match.nil?

    @major = version_string_match[:major].to_i
    @minor = version_string_match[:minor].to_i
    @patch = version_string_match[:patch].to_i
    @release_type = version_string_match[:release_type] || DEFAULT_RELEASE_TYPE
    @build = version_string_match[:build].to_i
  end

  def to_bitpack
    (major << PACK_SHIFTS[:major]) |
      (minor << PACK_SHIFTS[:minor]) |
      (patch << PACK_SHIFTS[:patch]) |
      (release_type_index << PACK_SHIFTS[:release_type]) |
      (build)
  end

  private

  def release_type_index
    RELEASE_TYPES.index(release_type)
  end
end

We start by defining a regular expression that matches a SemVer String. The expression uses the exploded modifier, so that we can break it up across multiple lines and make it easier to read:

SEMVER_STRING_REGEXP = /^
    (?<major>\d+)\.                 # Major version number
    (?<minor>\d+)\.                 # Minor version number
    (?<patch>\d+)                   # Patch version number
    (-?)                            # Optional pre-release separator
    (?<release_type>alpha|beta|rc)? # Optional pre-release type
    (\.?)
    (?<build>\d*)?                  # Optional pre-release version number
    (?<metadata>.*)?                # Optional metadata (ignored)
  $/ix

If you’re not used to reading regular expressions in this format, don’t stress too much. For now, it’s enough to say that this regular expression will capture the important components of a SemVer string using the following named captures:

SEMVER_STRING_REGEXP.names
=> ["major", "minor", "patch", "release_type", "build", "metadata"]

Next, we define a constant that stores the number of places that we will left shift each captured section of the string by, when we encode it as 32-bit binary. These values match the sizes that we defined for each segment in Part 1.

PACK_SHIFTS = {
  major: 25,
  minor: 15,
  patch: 5,
  release_type: 3,
  build: 0
}

We then define a constant to hold all of the release types that we decided we would support. This will act as as sort of enum, and allow us to easily convert a release type to an integer (and binary) value. For example:

RELEASE_TYPES = %w[alpha beta rc release]
release_type_index = RELEASE_TYPES.index('rc') # => 2
release_type_index.to_s(2) # => '10'

In the initializer, we set all of the components of our SemVer string to instance variables. Because we have defined named captures in our regular expression, this is really simple to map to the match value object returned when we match the version_string against SEMVER_STRING_REGEXP.

version_string_match = version_string.to_s.match(SEMVER_STRING_REGEXP)
@major = version_string_match[:major].to_i
@minor = version_string_match[:minor].to_i
@patch = version_string_match[:patch].to_i
@release_type = version_string_match[:release_type] || DEFAULT_RELEASE_TYPE
@build = version_string_match[:build].to_i

You’ll notice the @release_type is set to DEFAULT_RELEASE_TYPE if there’s no match value. This is to support shorter version strings like 1.2.3, which have no prerelease values, since we can assume that those are just regular releases by default. Adding this here as the fourth release type value ensures that regular releases are sorted after prereleases when we compare them.

Finally, we define #to_bitpack, which will return all of these values packed together as a single 32-bit integer. As described above, we use the left-shift << operator to move each component into the desired bit-places, and the or operator | to add these all together.

But does it work? In the previous part, we finished by saying that, following our design, the SemVer string 52.123.12-beta.2 would be represented as the integer 1748861322 (binary value: 0110100 0001111011 0000001100 01 010).

Let’s run our code in the console and see what we get?

SemanticVersion.new('52.123.12-beta.2').to_bitpack
=> 1748861322

It works!

Stitching it all together

Now, in order to sort our UserDevices in the correct SemVer order, all we need to do is ensure our #latest_app_version_bitpack column is properly populated. There are a few ways we can achieve this, but the simplest way is probably just to add a callback to our model to update this column when the #latest_app_version changes:

# == Schema Information
#
# Table name: user_devices
#
#  id                          :bigint           not null, primary key
#  user_id                     :bigint           not null
#  operating_system            :text             not null
#  latest_app_version          :string           not null
#  latest_app_version_bitpack  :integer
#  created_at                  :datetime         not null
#  updated_at                  :datetime         not null
#
class UserDevice < ApplicationRecord

  scope :ordered_by_latest_app_version, 
        -> { order(:latest_app_version) }


  before_save :set_latest_app_version_bitpack, if: :latest_app_version_changed?
  
  private 
  
  def set_latest_app_version_bitpack
    self.latest_app_version_bitpack = SemanticVersion.new(latest_app_version).to_bitpack
  end
end

This should ensure that our #latest_app_version_bitpack is always kept up to date with the integer representation of for the current value of #latest_app_version.

Lastly, let’s update our .ordered_by_latest_app_version to sort by #set_latest_app_version_bitpack:

# == Schema Information
#
# Table name: user_devices
#
#  id                          :bigint           not null, primary key
#  user_id                     :bigint           not null
#  operating_system            :text             not null
#  latest_app_version          :string           not null
#  latest_app_version_bitpack  :integer
#  created_at                  :datetime         not null
#  updated_at                  :datetime         not null
#
class UserDevice < ApplicationRecord

  scope :ordered_by_latest_app_version, 
        -> { order(:latest_app_version_bitpack) }


  before_save :set_latest_app_version_bitpack, if: :latest_app_version_changed?
  
  private 
  
  def set_latest_app_version_bitpack
    self.latest_app_version_bitpack = SemanticVersion.new(latest_app_version).to_bitpack
  end
end

Now, when we call UserDevice.ordered_by_latest_app_version, the result should be sorted in order of proper SemVer precedence.

Conclusion

Bitpacking is a fun and efficient way of storing data like this, and can be really effective when we need to sort it quickly. But handling important data conversions like this in our application code can still lead to bugs and issues, that might cause our data to go out of sync. In part 3, we’ll look at how we can move all of this into the database with generated columns and functions.

Happy coding

Written by

Photo of Gavin Morrice
Gavin Morrice

Software engineer based in Scotland

Connect with me