Migrations on large tables in Rails


Hey there.

Today we’ll discuss migrations and related problems on a large tables.

Let’s start from easy thing - just add column for a table with 10 million rows.

Task 1:

Add two columns to the products table with default values.

class AddFieldsToProducts < ActiveRecord::Migration
  def change
    add_column :products, :shipped, :boolean, default: false
    add_column :products, :cancelled, :boolean, default: false
  end
end

In this case your migration takes hours, and the products table will be locked.

Solution 1:

What you should do? Postgres can create null columns super fast even on a very big tables.

We can split our migration to the two tasks:

  • Creating the columns
  • Populating the default value

You can do the last step in background for example with Sidekiq or any other background job, or for example do a batch updates.

Cool. Let’s move forward.

Task 2:

Let’s change column type in a huge table.

Solution 2:

Instead, add a new column with a new name/type - you can add new column, update your code to use new one, and when you’ve completed your upgrades, you can DROP the column. Another integersting thing - with ActiveRecord you cou can hide the old column from your code like this:

class Product < ActiveRecord::Base
  # TODO: You should remove this after completing migration
  def self.columns
    super.reject { |c| c.name == 'status' }
  end
end

Task 3:

Removing a column

Solution 3:

In this case - instead of dropping column, your first step should be telling ActiveRecord to stop writing to this column and modify your code to not use it.

Let’s take a look into an example:

class Products < ActiveRecord::Base
  def self.columns
    super.reject { |column| column.name == "status" }
  end
end

After deploying modified code, you can simple remove column as usual:

class RemoveStatusFromUsers < ActiveRecord::Migration
  def change
    remove_column :users, :status, :string
  end
end

Task 4:

Renaming a column

Solution 4:

To rename a column in Rails and ActiveRecord we need to do 3 deployments.

As an example, let’s imagine we have a status column on our products table which we would like to rename to current_status.

You can start from the adding new column and tell ActiveRecord to write to both the old and the new columns, but keep reading from the old column.

Let’s add a column first:

class AddCurrentStatusToProducts < ActiveRecord::Migration
  def change
    add_column :products, :current_status, :datetime
  end
end

Let’s tell AR to set and read from new column:

class Product < ActiveRecord::Base
  def current_status=(value)
    self[:status] = value
    self[:current_status] = value
  end

  def current_status
    self[:current_status]
  end
end

Ok, let’s deploy your changes, and wait a little to sync all columns (btw you can use some background worker to populate data). After deploying you can use your new column.

class SyncColumnsInProducts < ActiveRecord::Migration
  def up
    execute "UPDATE products SET current_status = status;"
  end

  def down
  end
end

Let’s use new column name:

class Product < ActiveRecord::Base
  def self.columns
    super.reject { |column| column.name == "status" }
  end
end

After deploing this change, you can simple remove column:

class RemoveStatusFromProducts < ActiveRecord::Migration
  def change
    remove_column :products, :status
  end
end

Bonus tip

If some parts of your application use heavy and slow queries, good choice here to use read-only replica of your main DB.

You can just setup it and route all your necessary queries into it. It can help you to avoid additional timeouts and perfomance problems on main DB.