This post is part 2 of 3 describing how Semantic Version strings can be converted to sortable integers for easier and faster sorting.
- Part 1 explains the concept and the constraints of this method
- Part 2 describes how to implement this in ActiveRecord
- Part 3 (TODO) describes an alternative approach that is SQL driven
Implementing this in ActiveRecord
In the previous post, we walked through the concept of how we can store a SemVer as a 32-bit integer so that we can properly sort records in the correct order precedence for SemVer. Now, let’s have a look at how we could add this to a hypothetical ActiveRecord
model.
An example model
Consider a model called UserDevice
that tracks the existence of a mobile device for a user. Since a user might be logged in on more than one device, we can say that a user has many user devices. This model and table records the user, the device operating system, and the latest known app version on that device:
# == Schema Information
#
# Table name: user_devices
#
# id :bigint not null, primary key
# user_id :bigint not null
# operating_system :text not null
# latest_app_version :string not null
# created_at :datetime not null
# updated_at :datetime not null
#
class UserDevice < ApplicationRecord
# ...
scope :ordered_by_latest_app_version,
-> { order(:latest_app_version) }
# ...
end
This version of our model currently suffers from the bug mentioned in the part 1 post. Namely, that records will appear in the wrong order when sorted by latest_app_version
(i.e. 1.10.0
would come before 1.9.0
).
To fix this, we need a way of storing our latest_app_version
SemVer string into a sortable integer representation.
We probably also want to see a readable version of the SemVer string that makes sense, so we should store our sortable column as an additional column on our table, and leave the current (more readable) latest_app_version
in place
We can add this with a simple database migration:
$ rails g migration add_latest_app_version_bitpack_to_user_devices latest_app_version_bitpack:integer
This should create a migration that looks something like:
class AddLatestAppVersionBitpackToUserDevices < ActiveRecord::Migration[8.0]
def change
add_column :user_devices,
:latest_app_version_bitpack,
:integer
end
end
Let’s run that migration and update the table to include our new column!
$ rails db:migrate
Now we can see in the schema information comments above our model, that the latest_app_version_bitpack
has been added and is defined as an integer
# == Schema Information
#
# Table name: user_devices
#
# id :bigint not null, primary key
# user_id :bigint not null
# operating_system :text not null
# latest_app_version :string not null
# latest_app_version_bitpack :integer
# created_at :datetime not null
# updated_at :datetime not null
#
class UserDevice < ApplicationRecord
# ...
scope :ordered_by_latest_app_version,
-> { order(:latest_app_version) }
# ...
end
This column won’t be usable yet, though, because we haven’t populated it with any data. To do so, we’ll have to find a way to create our BitPack representations of a SemVer string that we designed in Part 1. Let’s do that now…
Bitwise operators in Ruby
Ruby supports standard bitwise operators, with the following operator methods:
8 << 2 # => 32 (Left-shift)
8 >> 2 # => 2 (Right-shift)
8 | 2 # => 10 (Or)
8 & 2 # => 0 (And)
8 ^ 2 # => 0 (Xor)
~8 # => -9 (Not)
Luckily, we’re only interested in the two of these for this solution: the <<
and |
operators.
Shift operators
In Part 1, we were able to assign portions of our long 32-bit binary value, so that sub-sections of that number meant something to us. This allows us to encode multiple values in the same 32-bit binary representation. But how exactly do we assign portions of a binary number like this? That’s exactly what left-shift and right-shift operators do.
Let’s look at a really simple example. Here’s the 8-bit binary representation for the number 1:
0000 0001
But what if we were to shift those numbers all one place to the left? So that it looks like this:
0000 0010
This is exactly what the left-shift operator does. We can try this on the number 1 in Ruby to see the output:
value = (1 << 1) # shift the number 1 one position to the left
sprintf("%08b", value) # Print the 8-bit binary value for this number
# => "00000010"
Simply put, left-shift will take the binary representation of an Integer, and shift all of the numbers one position to the left.
Incidentally, right-shift takes the binary representation of an Integer, and shifts all of the numbers one position to the right.
The OR operator
We’ve looked at how the shift operators move bits around, but how do we actually combine those bits into one integer? That’s where the bitwise OR operator (|
) comes in. It compares each bit of its operands and sets each bit in the result to 1
if that bit is 1
in either operand.
For instance, lets see what the or operator looks like when we use it with these two binary values 00000110
and 00000011
:
binary_value = 0b00000110 | 0b00000011
sprintf("%08b", binary_value)
# => "00000111"
All of the bits that are 0
in both numbers are still 0
in the resulting value. But if any of the bits are 1
in either of the numbers, then they are also 1
in the resulting value. This is similar to a boolean OR that we use in Ruby, since the value is true
if either operand is true
—just applied to each bit in the numbers.
This is a convenient way to concatenate two binary values together. Because each component is shifted into its own region of the 32-bit integer, there is no overlap—making the OR operator an ideal way to combine multiple shifted values into one sortable integer.
Encoding our SemVer string
Now that we understand what the left-shift operator does, we have everything we need to encode a SemVer string into a 32-bit integer in Ruby.
Let’s see how that would look like in code—don’t worry, we’ll go through it bit by bit (pardon the pun):
class SemanticVersion
SEMVER_STRING_REGEXP = /^
(?<major>\d+)\. # Major version number
(?<minor>\d+)\. # Minor version number
(?<patch>\d+) # Patch version number
(-?) # Optional pre-release separator
(?<release_type>alpha|beta|rc)? # Optional pre-release type
(\.?)
(?<build>\d*)? # Optional pre-release version number
(?<metadata>.*)? # Optional metadata (ignored)
$/ix
PACK_SHIFTS = {
major: 25,
minor: 15,
patch: 5,
release_type: 3,
build: 0
}
RELEASE_TYPES = %w[alpha beta rc release]
DEFAULT_RELEASE_TYPE = "release"
attr_reader :version_string
attr_reader :major, :minor, :patch, :release_type, :build
private :major, :minor, :patch, :release_type, :build
def initialize(version_string)
version_string_match = version_string.to_s.match(SEMVER_STRING_REGEXP)
fail "Invalid Semantic Version string: #{version_string}" if version_string_match.nil?
@major = version_string_match[:major].to_i
@minor = version_string_match[:minor].to_i
@patch = version_string_match[:patch].to_i
@release_type = version_string_match[:release_type] || DEFAULT_RELEASE_TYPE
@build = version_string_match[:build].to_i
end
def to_bitpack
(major << PACK_SHIFTS[:major]) |
(minor << PACK_SHIFTS[:minor]) |
(patch << PACK_SHIFTS[:patch]) |
(release_type_index << PACK_SHIFTS[:release_type]) |
(build)
end
private
def release_type_index
RELEASE_TYPES.index(release_type)
end
end
We start by defining a regular expression that matches a SemVer String. The expression uses the exploded modifier, so that we can break it up across multiple lines and make it easier to read:
SEMVER_STRING_REGEXP = /^
(?<major>\d+)\. # Major version number
(?<minor>\d+)\. # Minor version number
(?<patch>\d+) # Patch version number
(-?) # Optional pre-release separator
(?<release_type>alpha|beta|rc)? # Optional pre-release type
(\.?)
(?<build>\d*)? # Optional pre-release version number
(?<metadata>.*)? # Optional metadata (ignored)
$/ix
If you’re not used to reading regular expressions in this format, don’t stress too much. For now, it’s enough to say that this regular expression will capture the important components of a SemVer string using the following named captures:
SEMVER_STRING_REGEXP.names
=> ["major", "minor", "patch", "release_type", "build", "metadata"]
Next, we define a constant that stores the number of places that we will left shift each captured section of the string by, when we encode it as 32-bit binary. These values match the sizes that we defined for each segment in Part 1.
PACK_SHIFTS = {
major: 25,
minor: 15,
patch: 5,
release_type: 3,
build: 0
}
We then define a constant to hold all of the release types that we decided we would support. This will act as as sort of enum, and allow us to easily convert a release type to an integer (and binary) value. For example:
RELEASE_TYPES = %w[alpha beta rc release]
release_type_index = RELEASE_TYPES.index('rc') # => 2
release_type_index.to_s(2) # => '10'
In the initializer, we set all of the components of our SemVer string to instance variables. Because we have defined named captures in our
regular expression, this is really simple to map to the match value object returned when we match the version_string
against SEMVER_STRING_REGEXP
.
version_string_match = version_string.to_s.match(SEMVER_STRING_REGEXP)
@major = version_string_match[:major].to_i
@minor = version_string_match[:minor].to_i
@patch = version_string_match[:patch].to_i
@release_type = version_string_match[:release_type] || DEFAULT_RELEASE_TYPE
@build = version_string_match[:build].to_i
You’ll notice the @release_type
is set to DEFAULT_RELEASE_TYPE
if there’s no match value. This is to support shorter version strings like 1.2.3
, which have no prerelease values, since we can assume that those are just regular releases by default. Adding this here as the fourth release type value ensures that regular releases are sorted after prereleases when we compare them.
Finally, we define #to_bitpack
, which will return all of these values packed together as a single 32-bit integer. As described above, we use the left-shift <<
operator to move each component into the desired bit-places, and the or operator |
to add these all together.
But does it work? In the previous part, we finished by saying that, following our design, the SemVer string 52.123.12-beta.2
would be represented as the integer 1748861322
(binary value: 0110100 0001111011 0000001100 01 010
).
Let’s run our code in the console and see what we get?
SemanticVersion.new('52.123.12-beta.2').to_bitpack
=> 1748861322
It works!
Stitching it all together
Now, in order to sort our UserDevices
in the correct SemVer order, all we need to do is ensure our #latest_app_version_bitpack
column is properly populated. There are a few ways we can achieve this, but the simplest way is probably just to add a callback to our model to update this column when the #latest_app_version
changes:
# == Schema Information
#
# Table name: user_devices
#
# id :bigint not null, primary key
# user_id :bigint not null
# operating_system :text not null
# latest_app_version :string not null
# latest_app_version_bitpack :integer
# created_at :datetime not null
# updated_at :datetime not null
#
class UserDevice < ApplicationRecord
scope :ordered_by_latest_app_version,
-> { order(:latest_app_version) }
before_save :set_latest_app_version_bitpack, if: :latest_app_version_changed?
private
def set_latest_app_version_bitpack
self.latest_app_version_bitpack = SemanticVersion.new(latest_app_version).to_bitpack
end
end
This should ensure that our #latest_app_version_bitpack
is always kept up to date with the integer representation of for the current value of #latest_app_version
.
Lastly, let’s update our .ordered_by_latest_app_version
to sort by #set_latest_app_version_bitpack
:
# == Schema Information
#
# Table name: user_devices
#
# id :bigint not null, primary key
# user_id :bigint not null
# operating_system :text not null
# latest_app_version :string not null
# latest_app_version_bitpack :integer
# created_at :datetime not null
# updated_at :datetime not null
#
class UserDevice < ApplicationRecord
scope :ordered_by_latest_app_version,
-> { order(:latest_app_version_bitpack) }
before_save :set_latest_app_version_bitpack, if: :latest_app_version_changed?
private
def set_latest_app_version_bitpack
self.latest_app_version_bitpack = SemanticVersion.new(latest_app_version).to_bitpack
end
end
Now, when we call UserDevice.ordered_by_latest_app_version
, the result should be sorted in order of proper SemVer precedence.
Conclusion
Bitpacking is a fun and efficient way of storing data like this, and can be really effective when we need to sort it quickly. But handling important data conversions like this in our application code can still lead to bugs and issues, that might cause our data to go out of sync. In part 3, we’ll look at how we can move all of this into the database with generated columns and functions.
Happy coding