Sortable Semantic Version Strings in Rails (part 3)
This post is part 3 of 3 describing how Semantic Version strings can be converted to sortable integers for easier and faster sorting.
- Part 1 explains the concept and the constraints of this method
- Part 2 describes how to implement this in ActiveRecord
- Part 3 describes how to implement this in PostgreSQL.
Implementing Semantic Version strings in PostgreSQL
In the previous post, we implemented a Ruby class that will convert SemVer strings into unique integers that can be used for reliable and fast sorting of semantic version strings.
However, this implementation has some drawbacks. For one, setting the data in the application and updating it when the SemVer string is updated in the database introduces a vulnerability. In the case where the data is changed, either through direct access to the DB, or by calling update_column(:latest_app_version, '1.200.3')
, it’s possible for the value of latest_app_version_bitpack
to fall out of step with the value of latest_app_version
—which would introduce buggy behaviour.
A safer solution would be to simply move this behaviour to the database, and let the database manage the value of the latest_app_version_bitpack
column. If we could have our database automatically update the latest_app_version_bitpack
column each time latest_app_version
changes, we can be more certain that the two values still represent the same thing.
Adding a semver function to PostgreSQL
So that we don’t have to perform complex bitpacking operations multiple times, we should define a function that encapsulates the behaviour we wish to create. Namely, a function that will convert a SemVer string into a unique integer value, just like we did in part 2.
To do this in PostgreSQL, we can define a function that will live in our database schema:
Let’s add a migration that defines our function:
bin/rails g migration add_semver_function
Next, we should add our function definition to the migration:
class AddSemverFunction < ActiveRecord::Migration[7.1]
def up
execute <<-SQL
CREATE OR REPLACE FUNCTION semver_to_bitpack(version_string TEXT) RETURNS BIGINT AS
$$
DECLARE
-- This regular expression will match most semantic version strings with optional release types
-- alpha, beta, and rc
semver_regexp TEXT := '^([0-9]+)\.([0-9]+)\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
major INT := 0;
minor INT := 0;
patch INT := 0;
release_type TEXT := 'release';
build INT := 0;
release_type_int INT;
results_array TEXT[];
BEGIN
if version_string !~* semver_regexp then
raise exception 'Invalid semver string: %', version_string
using hint = 'Valid Semver includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
end if;
results_array := regexp_matches(version_string, semver_regexp);
major := results_array[1]::INT;
minor := results_array[2]::INT;
patch := results_array[3]::INT;
if results_array[4] is not null then
release_type := results_array[4];
end if;
if results_array[5] is not null then
build := results_array[5]::INT;
end if;
release_type_int = array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::int - 1;
RETURN ((major::BIGINT << 25) |
(minor::BIGINT << 15) |
(patch::BIGINT << 5) |
(release_type_int::BIGINT << 3) |
(build::BIGINT));
END ;
$$ LANGUAGE plpgsql IMMUTABLE
STRICT;
SQL
end
def down
execute "DROP FUNCTION semver_to_bitpack(TEXT)"
end
end
Understanding this PostgreSQL Function
Let’s have a look at the PostgreSQL function definition and go through it step by step
CREATE OR REPLACE FUNCTION semver_to_bitpack(version_string TEXT) RETURNS BIGINT AS $$
DECLARE
semver_regexp TEXT := '^([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
major INT := 0;
minor INT := 0;
patch INT := 0;
release_type TEXT := 'release';
build INT := 0;
release_type_int INT;
results_array TEXT[];
BEGIN
IF version_string !~* semver_regexp THEN
RAISE EXCEPTION 'Invalid SemVer string: %', version_string
USING hint = 'Valid SemVer includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
END IF;
results_array := regexp_matches(version_string, semver_regexp);
major := results_array[1]::INT;
minor := results_array[2]::INT;
patch := results_array[3]::INT;
IF results_array[4] IS NOT NULL THEN
release_type := results_array[4];
END IF;
IF results_array[5] IS NOT NULL THEN
build := results_array[5]::INT;
END IF;
release_type_int := array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::INT - 1;
RETURN ((major::BIGINT << 25) |
(minor::BIGINT << 15) |
(patch::BIGINT << 5) |
(release_type_int::BIGINT << 3) |
(build::BIGINT));
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
Step 1: Declaring Variables
DECLARE
semver_regexp TEXT := '^([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
major INT := 0;
minor INT := 0;
patch INT := 0;
release_type TEXT := 'release';
build INT := 0;
release_type_int INT;
results_array TEXT[];
First, we declare several variables to hold different parts of the SemVer string:
semver_regexp: The regular expression used to extract SemVer components. major, minor, patch: Integers to store the version components (e.g., 1.2.3). release_type: The pre-release type (e.g., alpha, beta, rc, or release). build: The build number associated with the pre-release, defaulting to 0. release_type_int: The integer representation of the release type. results_array: An empty array to hold the matches from the regular expression.
Step 2: Validating the SemVer String
IF version_string !~* semver_regexp THEN
RAISE EXCEPTION 'Invalid SemVer string: %', version_string
USING hint = 'Valid SemVer includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
END IF;
We use the regular expression (semver_regexp) to validate that the input version_string conforms to the SemVer format.
If the input does not match, the function raises an exception with a helpful hint.
Step 3: Extracting Components Using the Regular Expression
results_array := regexp_matches(version_string, semver_regexp);
major := results_array[1]::INT;
minor := results_array[2]::INT;
patch := results_array[3]::INT;
The regexp_matches() function extracts components from the input string.
results_array contains the matched parts of the SemVer string:
results_array[1]: The major version.
results_array[2]: The minor version.
results_array[3]: The patch version.
Step 4: Handling Optional Pre-Release and Build Information
IF results_array[4] IS NOT NULL THEN
release_type := results_array[4];
END IF;
IF results_array[5] IS NOT NULL THEN
build := results_array[5]::INT;
END IF;
If the pre-release type (e.g., alpha, beta, or rc) is present, it is stored in the release_type variable. If a build number (e.g., 3 in 1.0.0-beta.3) is present, it is stored in the build variable.
Step 5: Converting the Pre-Release Type to an Integer
release_type_int := array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::INT - 1;
The array_position() function finds the index of the release_type within the array [‘alpha’, ‘beta’, ‘rc’, ‘release’]. We subtract 1 so that alpha = 0, beta = 1, rc = 2, and release = 3. This preserves the proper order precedence when bit-packing.
Step 6: Bit-Packing the Components into a Single Integer
RETURN ((major::BIGINT << 25) |
(minor::BIGINT << 15) |
(patch::BIGINT << 5) |
(release_type_int::BIGINT << 3) |
(build::BIGINT));
Left-shifting («) is used to allocate bits for each component:
major::BIGINT << 25
: The major version occupies the leftmost 7 bits.minor::BIGINT << 15
: The minor version occupies the next 10 bits.patch::BIGINT << 5
: The patch version occupies the next 10 bits.release_type_int::BIGINT << 3
: The pre-release type occupies 2 bits.build::BIGINT
: The build version occupies the remaining 3 bits.
Step 7: Marking the Function as Immutable
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
IMMUTABLE: Indicates that the function always returns the same result for the same input, allowing PostgreSQL to optimise queries. STRICT: Indicates that the function should return NULL if any input is NULL.
Running the migration
Now, run the migration to add this function to our database.
Note: If you use ruby schema files (./db/schema.rb
), this function won’t be visible there. If you’d like a visible indication
that the function exists, you can use SQL based schema files instead, but this is not necessary.
# config/application.rb
module MyApp
class Application < Rails::Application
config.active_record.schema_format = :sql
end
end
Generated columns in Rails
PostgreSQL comes with a feature called generated columns. These are basically just columns in which the value is always computed from another column(s) value. For example, you could create a database column that contains size dimensions in centimeters, and have another generated column that will store that value multiplied by 2.54 to represent inches. Whenever the centimetres column changes, the inches column will change with it. This is a great way to maintain consistency across the data.
Creating a generated column in Rails is really easy. Let’s add a migration to add a new column to our user_devices
table:
$ bin/rails generate migration add_latest_app_version_int_to_user_devices latest_app_version_int:integer
This will create a new file with the following code
class AddLatestAppVersionIntToUserDevices < ActiveRecord::Migration[8.0]
def change
add_column :user_devices, :latest_app_version_int, :integer
end
end
Update this, to tell Rails that our column should be a stored virtual column instead of a regular (writeable) integer column:
class AddLatestAppVersionIntToUserDevices < ActiveRecord::Migration[8.0]
def change
add_column :user_devices, :latest_app_version_int, :integer, as: 'semver_to_bitpack(latest_app_version)', stored: true
end
end
Here, we’ve defined our column to set its value as the output of our semver_to_bitpack
function, with the current value of latest_app_version
as its input value.
Run the migration by calling
bin/rails db:migrate
And, that’s it!
We should now have a column called latest_app_version_int
on our user_devices
table that will always store an up to date reference of the latest_app_version
column.
With this in place, we can remove the extra code we added in Part 2, reverting our UserDevice
model to simply:
# == Schema Information
#
# Table name: user_devices
#
# id :bigint not null, primary key
# user_id :bigint not null
# operating_system :text not null
# latest_app_version :string not null
# latest_app_version_bitpack :integer # This column can be dropped
# latest_app_version_int :integer
# created_at :datetime not null
# updated_at :datetime not null
#
class UserDevice < ApplicationRecord
scope :ordered_by_latest_app_version,
-> { order(:latest_app_version_int) }
end
Conclusion
Bit-packing is a fun and efficient way of storing data into very small packs. The approach demonstrated here is one way of solving the issue of sorting semver strings that is scalable, highly performant, and takes very little effort to set up once you understand the concepts.