Some people like using Arel. I don’t. I stick with ActiveRecord + some SQL when I’m querying the database from Rails.

We were having this discussion at the office today and I reminisced about an interesting problem somebody had shared on StackOverflow recently. I thought I’d share it here. It can be a fun interview question, if you’re into torturing the candidate.

I’ll lay out the problem as clearly as I can:

  • Two models: User and Report
  • A user can have many reports
  • You have 2 reports (say, with IDs 1 and 2)
  • You want to grab all the users that fulfill any of the following criteria:
    • User has only one report. Report ID is 1.
    • User has only one report. Report ID is 2.
    • User has only two reports. Report IDs are 1 and 2.

Conditions:

  • You must assume that there can be users with 0 reports.
  • You need to do this using ActiveRecord (raw SQL is permitted).
  • You can’t use subqueries.
  • You can’t use multiple queries.

Solution:

First, I jumped at the following solution:

# ruby code

report_ids = [1, 2]

users = User.
          joins(:reports).
          where(reports: {id: report_ids}).
          group("users.id")

But that’s wrong. Suppose user1 could have report 1, 2, and 3; he will fulfill conditions in the code above but fail our requirements which are: user should have either only one or only two reports (with our specific report IDs).

Once I realized this, my next solution was this:

# ruby code

ignore_user_ids = User.
                    joins(:reports).
                    where("reports.id NOT IN (?)", report_ids).
                    pluck("DISTINCT users.id")

users = User.
          joins(:reports).
          where("users.id NOT IN (?)", ignore_user_ids).
          group("users.id")

This was acceptable since the question didn’t specify any query-conditions.

But it bothered me. I wanted ONE query! WITHOUT subqueries! I wanted blood!

I exaggerate. But I did think about it some more and found a one-query solution that involved writing SQL. Here:

# ruby code

ris = report_ids.join(',')
user_columns = User.column_names.join(',')

users = \
  User.
  joins("INNER JOIN reports AS r1 ON (
    (r1.user_id = users.id) AND
    (r1.id IN (#{ris}))
  )"). # inner join our required reports
  joins("LEFT OUTER JOIN reports AS r2 ON (
    (r2.user_id = users.id) AND
    (r2.id NOT IN (#{ris}))
  )"). # outer join all reports we don't require
  select(
    # select user columns and
    # count the number of reports NOT required
    "#{user_columns}, COUNT(r2.id) AS r2c"
  ).
  where(
    # specify the condition on the reports you DON'T want
    "r2c = 0"
  ). # or having("r2c = 0").
  group("users.id")

Now is it wise to actually use that solution in production code? Absolutely not! You don’t want later developers to look at that monstrosity and spend more time than necessary trying to understand what’s going on.

BUT exercises such as these can keep your SQL skills sharp - which is an area where many Rails developers (including myself) have a lot of room for improvement.

Incidentally, if there’s a better way to accomplish what I said, PLEASE let me know! I’m always on the lookout for doing things better.

Link to said StackOverflow post.