Handy Rails Tips

Sortable Semantic Version Strings in Rails (part 3)

This post is part 3 of 3 describing how Semantic Version strings can be converted to sortable integers for easier and faster sorting.

  • Part 1 explains the concept and the constraints of this method
  • Part 2 describes how to implement this in ActiveRecord
  • Part 3 describes how to implement this in PostgreSQL.

Implementing Semantic Version strings in PostgreSQL

In the previous post, we implemented a Ruby class that will convert SemVer strings into unique integers that can be used for reliable and fast sorting of semantic version strings.

However, this implementation has some drawbacks. For one, setting the data in the application and updating it when the SemVer string is updated in the database introduces a vulnerability. In the case where the data is changed, either through direct access to the DB, or by calling update_column(:latest_app_version, '1.200.3'), it’s possible for the value of latest_app_version_bitpack to fall out of step with the value of latest_app_version—which would introduce buggy behaviour.

A safer solution would be to simply move this behaviour to the database, and let the database manage the value of the latest_app_version_bitpack column. If we could have our database automatically update the latest_app_version_bitpack column each time latest_app_version changes, we can be more certain that the two values still represent the same thing.

Adding a semver function to PostgreSQL

So that we don’t have to perform complex bitpacking operations multiple times, we should define a function that encapsulates the behaviour we wish to create. Namely, a function that will convert a SemVer string into a unique integer value, just like we did in part 2.

To do this in PostgreSQL, we can define a function that will live in our database schema:

Let’s add a migration that defines our function:

bin/rails g migration add_semver_function

Next, we should add our function definition to the migration:

class AddSemverFunction < ActiveRecord::Migration[7.1]
  def up
    execute <<-SQL
        CREATE OR REPLACE FUNCTION semver_to_bitpack(version_string TEXT) RETURNS BIGINT AS
$$
DECLARE
    -- This regular expression will match most semantic version strings with optional release types
    -- alpha, beta, and rc 
    semver_regexp    TEXT := '^([0-9]+)\.([0-9]+)\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
    major            INT  := 0;
    minor            INT  := 0;
    patch            INT  := 0;
    release_type     TEXT := 'release';
    build            INT  := 0;
    release_type_int INT;
    results_array    TEXT[];
BEGIN
    if version_string !~* semver_regexp then
        raise exception 'Invalid semver string: %', version_string
            using hint = 'Valid Semver includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
    end if;

    results_array := regexp_matches(version_string, semver_regexp);
    major := results_array[1]::INT;
    minor := results_array[2]::INT;
    patch := results_array[3]::INT;
    if results_array[4] is not null then
        release_type := results_array[4];
    end if;
    if results_array[5] is not null then
        build := results_array[5]::INT;
    end if;

    release_type_int = array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::int - 1;

    RETURN ((major::BIGINT << 25) |
            (minor::BIGINT << 15) |
            (patch::BIGINT << 5) |
            (release_type_int::BIGINT << 3) |
            (build::BIGINT));
END ;
$$ LANGUAGE plpgsql IMMUTABLE
                    STRICT;
    SQL
  end

  def down
    execute "DROP FUNCTION semver_to_bitpack(TEXT)"
  end
end

Understanding this PostgreSQL Function

Let’s have a look at the PostgreSQL function definition and go through it step by step

CREATE OR REPLACE FUNCTION semver_to_bitpack(version_string TEXT) RETURNS BIGINT AS $$
DECLARE
semver_regexp    TEXT := '^([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
major            INT  := 0;
minor            INT  := 0;
patch            INT  := 0;
release_type     TEXT := 'release';
build            INT  := 0;
release_type_int INT;
results_array    TEXT[];
BEGIN
IF version_string !~* semver_regexp THEN
RAISE EXCEPTION 'Invalid SemVer string: %', version_string
USING hint = 'Valid SemVer includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
END IF;

    results_array := regexp_matches(version_string, semver_regexp);
    major := results_array[1]::INT;
    minor := results_array[2]::INT;
    patch := results_array[3]::INT;

    IF results_array[4] IS NOT NULL THEN
        release_type := results_array[4];
    END IF;
    
    IF results_array[5] IS NOT NULL THEN
        build := results_array[5]::INT;
    END IF;

    release_type_int := array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::INT - 1;

    RETURN ((major::BIGINT << 25) |
            (minor::BIGINT << 15) |
            (patch::BIGINT << 5) |
            (release_type_int::BIGINT << 3) |
            (build::BIGINT));
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

Step 1: Declaring Variables

DECLARE
semver_regexp    TEXT := '^([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:-(?:(alpha|beta|rc)\.?([0-9]*)))?(?:\\+[-_a-z0-9]*)?$';
major            INT  := 0;
minor            INT  := 0;
patch            INT  := 0;
release_type     TEXT := 'release';
build            INT  := 0;
release_type_int INT;
results_array    TEXT[];

First, we declare several variables to hold different parts of the SemVer string:

semver_regexp: The regular expression used to extract SemVer components. major, minor, patch: Integers to store the version components (e.g., 1.2.3). release_type: The pre-release type (e.g., alpha, beta, rc, or release). build: The build number associated with the pre-release, defaulting to 0. release_type_int: The integer representation of the release type. results_array: An empty array to hold the matches from the regular expression.

Step 2: Validating the SemVer String

IF version_string !~* semver_regexp THEN
RAISE EXCEPTION 'Invalid SemVer string: %', version_string
USING hint = 'Valid SemVer includes major.minor.patch(-pre-release)(+build) (see https://semver.org/)';
END IF;

We use the regular expression (semver_regexp) to validate that the input version_string conforms to the SemVer format.

If the input does not match, the function raises an exception with a helpful hint.

Step 3: Extracting Components Using the Regular Expression

results_array := regexp_matches(version_string, semver_regexp);
major := results_array[1]::INT;
minor := results_array[2]::INT;
patch := results_array[3]::INT;
The regexp_matches() function extracts components from the input string.
results_array contains the matched parts of the SemVer string:
results_array[1]: The major version.
results_array[2]: The minor version.
results_array[3]: The patch version.

Step 4: Handling Optional Pre-Release and Build Information

IF results_array[4] IS NOT NULL THEN
release_type := results_array[4];
END IF;

IF results_array[5] IS NOT NULL THEN
build := results_array[5]::INT;
END IF;

If the pre-release type (e.g., alpha, beta, or rc) is present, it is stored in the release_type variable. If a build number (e.g., 3 in 1.0.0-beta.3) is present, it is stored in the build variable.

Step 5: Converting the Pre-Release Type to an Integer

release_type_int := array_position(ARRAY ['alpha', 'beta', 'rc', 'release'], release_type)::INT - 1;

The array_position() function finds the index of the release_type within the array [‘alpha’, ‘beta’, ‘rc’, ‘release’]. We subtract 1 so that alpha = 0, beta = 1, rc = 2, and release = 3. This preserves the proper order precedence when bit-packing.

Step 6: Bit-Packing the Components into a Single Integer

RETURN ((major::BIGINT << 25) |
(minor::BIGINT << 15) |
(patch::BIGINT << 5) |
(release_type_int::BIGINT << 3) |
(build::BIGINT));

Left-shifting («) is used to allocate bits for each component:

  • major::BIGINT << 25: The major version occupies the leftmost 7 bits.
  • minor::BIGINT << 15: The minor version occupies the next 10 bits.
  • patch::BIGINT << 5: The patch version occupies the next 10 bits.
  • release_type_int::BIGINT << 3: The pre-release type occupies 2 bits.
  • build::BIGINT: The build version occupies the remaining 3 bits.

Step 7: Marking the Function as Immutable

$$ LANGUAGE plpgsql IMMUTABLE STRICT;

IMMUTABLE: Indicates that the function always returns the same result for the same input, allowing PostgreSQL to optimise queries. STRICT: Indicates that the function should return NULL if any input is NULL.

Running the migration

Now, run the migration to add this function to our database.

Note: If you use ruby schema files (./db/schema.rb), this function won’t be visible there. If you’d like a visible indication that the function exists, you can use SQL based schema files instead, but this is not necessary.

# config/application.rb
module MyApp
  class Application < Rails::Application
    config.active_record.schema_format = :sql
  end 
end

Generated columns in Rails

PostgreSQL comes with a feature called generated columns. These are basically just columns in which the value is always computed from another column(s) value. For example, you could create a database column that contains size dimensions in centimeters, and have another generated column that will store that value multiplied by 2.54 to represent inches. Whenever the centimetres column changes, the inches column will change with it. This is a great way to maintain consistency across the data.

Creating a generated column in Rails is really easy. Let’s add a migration to add a new column to our user_devices table:

$ bin/rails generate migration add_latest_app_version_int_to_user_devices latest_app_version_int:integer

This will create a new file with the following code

class AddLatestAppVersionIntToUserDevices < ActiveRecord::Migration[8.0]
  def change
    add_column :user_devices, :latest_app_version_int, :integer
  end
end

Update this, to tell Rails that our column should be a stored virtual column instead of a regular (writeable) integer column:

class AddLatestAppVersionIntToUserDevices < ActiveRecord::Migration[8.0]
  def change
    add_column :user_devices, :latest_app_version_int, :integer, as: 'semver_to_bitpack(latest_app_version)', stored: true
  end
end

Here, we’ve defined our column to set its value as the output of our semver_to_bitpack function, with the current value of latest_app_version as its input value.

Run the migration by calling

bin/rails db:migrate

And, that’s it!

We should now have a column called latest_app_version_int on our user_devices table that will always store an up to date reference of the latest_app_version column.

With this in place, we can remove the extra code we added in Part 2, reverting our UserDevice model to simply:

# == Schema Information
#
# Table name: user_devices
#
#  id                          :bigint           not null, primary key
#  user_id                     :bigint           not null
#  operating_system            :text             not null
#  latest_app_version          :string           not null
#  latest_app_version_bitpack  :integer # This column can be dropped
#  latest_app_version_int      :integer
#  created_at                  :datetime         not null
#  updated_at                  :datetime         not null
#
class UserDevice < ApplicationRecord

  scope :ordered_by_latest_app_version, 
        -> { order(:latest_app_version_int) }
end

Conclusion

Bit-packing is a fun and efficient way of storing data into very small packs. The approach demonstrated here is one way of solving the issue of sorting semver strings that is scalable, highly performant, and takes very little effort to set up once you understand the concepts.