SQL Query Guidelines

This document describes various guidelines to follow when writing SQL queries, either using ActiveRecord/Arel or raw SQL queries.

Using LIKE Statements

The most common way to search for data is using theLIKEstatement. For example, to get all issues with a title starting with "WIP:" you'd write the following query:

SELECT*FROMissuesWHEREtitleLIKE'WIP:%';

On PostgreSQL theLIKEstatement is case-sensitive. On MySQL this depends on the case-sensitivity of the collation, which is usually case-insensitive. To perform a case-insensitiveLIKEon PostgreSQL you have to useILIKEinstead. This statement in turn isn't supported on MySQL.

To work around this problem you should writeLIKEqueries using Arel instead of raw SQL fragments as Arel automatically usesILIKEon PostgreSQL andLIKEon MySQL. This means that instead of this:

Issue.where('title LIKE ?','WIP:%')

You'd write this instead:

Issue.where(Issue.arel_table[:title].matches('WIP:%'))

Herematchesgenerates the correctLIKE/ILIKEstatement depending on the database being used.

If you need to chain multipleORconditions you can also do this using Arel:

table=Issue.arel_tableIssue.where(table[:title].matches('WIP:%').or(table[:foo].matches('WIP:%')))

For PostgreSQL this produces:

SELECT*FROMissuesWHERE(titleILIKE'WIP:%'ORfooILIKE'WIP:%')

In turn for MySQL this produces:

SELECT*FROMissuesWHERE(titleLIKE'WIP:%'ORfooLIKE'WIP:%')

LIKE & Indexes

Neither PostgreSQL nor MySQL use any indexes when usingLIKE/ILIKEwith a wildcard at the start. For example, this will not use any indexes:

SELECT*FROMissuesWHEREtitleILIKE'%WIP:%';

Because the value forILIKEstarts with a wildcard the database is not able to use an index as it doesn't know where to start scanning the indexes.

MySQL provides no known solution to this problem. Luckily PostgreSQLdoesprovide a solution: trigram GIN indexes. These indexes can be created as follows:

创建INDEX[CONCURRENTLY]index_name_hereONtable_nameUSINGGIN(column_namegin_trgm_ops);

The key here is theGIN(column_name gin_trgm_ops)part. This creates aGIN indexwith the operator class set togin_trgm_ops. These indexescanbe used byILIKE/LIKEand can lead to greatly improved performance. One downside of these indexes is that they can easily get quite large (depending on the amount of data indexed).

To keep naming of these indexes consistent please use the following naming pattern:

index_TABLE_on_COLUMN_trigram

For example, a GIN/trigram index forissues.titlewould be calledindex_issues_on_title_trigram.

Due to these indexes taking quite some time to be built they should be built concurrently. This can be done by using创建INDEX CONCURRENTLYinstead of just创建INDEX. Concurrent indexes cannotbe created inside a transaction. Transactions for migrations can be disabled using the following pattern:

classMigrationName<ActiveRecord::Migrationdisable_ddl_transaction!end

For example:

classAddUsersLowerUsernameEmailIndexes<ActiveRecord::Migrationdisable_ddl_transaction!defupreturnunlessGitlab::Database.postgresql?execute'CREATE INDEX CONCURRENTLY index_on_users_lower_username ON users (LOWER(username));'execute'CREATE INDEX CONCURRENTLY index_on_users_lower_email ON users (LOWER(email));'enddefdownreturnunlessGitlab::Database.postgresql?remove_index:users,:index_on_users_lower_usernameremove_index:users,:index_on_users_lower_emailendend

Plucking IDs

This can't be stressed enough:neveruse ActiveRecord'spluckto pluck a set of values into memory only to use them as an argument for another query. For example, this will make the databaseverysad:

项目=Project.all.pluck(:id)MergeRequest.where(source_project_id:项目)

Instead you can just use sub-queries which perform far better:

MergeRequest.where(source_project_id:Project.all.select(:id))

Theonlytime you should usepluckis when you actually need to operate on the values in Ruby itself (e.g. write them to a file). In almost all other cases you should ask yourself "Can I not just use a sub-query?".

Use UNIONs

工会在大多数Rails应用程序不是很常用lications but they're very powerful and useful. In most applications queries tend to use a lot of JOINs to get related data or data based on certain criteria, but JOIN performance can quickly deteriorate as the data involved grows.

For example, if you want to get a list of projects where the name contains a valueorthe name of the namespace contains a value most people would write the following query:

SELECT*FROM项目JOINnamespacesONnamespaces.id=项目.namespace_idWHERE项目.nameILIKE'%gitlab%'ORnamespaces.nameILIKE'%gitlab%';

Using a large database this query can easily take around 800 milliseconds to run. Using a UNION we'd write the following instead:

SELECT项目.*FROM项目WHERE项目.nameILIKE'%gitlab%'UNIONSELECT项目.*FROM项目JOINnamespacesONnamespaces.id=项目.namespace_idWHEREnamespaces.nameILIKE'%gitlab%';

该查询只需要around 15 milliseconds to complete while returning the exact same records.

This doesn't mean you should start using UNIONs everywhere, but it's something to keep in mind when using lots of JOINs in a query and filtering out records based on the joined data.

GitLab comes with aGitlab::SQL::Unionclass that can be used to build a UNION of multipleActiveRecord::Relationobjects. You can use this class as follows:

union=Gitlab::SQL::Union.new([项目,more_projects,...])Project.from("(#{union.to_sql}) projects")

Ordering by Creation Date

When ordering records based on the time they were created you can simply order by theidcolumn instead of ordering bycreated_at. Because IDs are always unique and incremented in the order that rows are created this will produce the exact same results. This also means there's no need to add an index oncreated_atto ensure consistent performance asidis already indexed by default.

Use WHERE EXISTS instead of WHERE IN

WhileWHERE INandWHERE EXISTScan be used to produce the same data it is recommended to useWHERE EXISTSwhenever possible. While in many cases PostgreSQL can optimiseWHERE INquite well there are also many cases whereWHERE EXISTSwill perform (much) better.

In Rails you have to use this by creating SQL fragments:

Project.where('EXISTS (?)',User.select(1).where('projects.creator_id = users.id AND users.foo = X'))

This would then produce a query along the lines of the following:

SELECT*FROM项目WHEREEXISTS(SELECT1FROMusersWHERE项目.creator_id=users.idANDusers.foo=X)
Baidu
map