SQL conventions¶
Preface¶
Database guidelines at eurodata¶
Be sure to read the database guidelines in the eurodata engineering handbook.
Whenever a Mizool Playbook rule reiterates, clarifies, extends, tweaks or changes eurodata guidelines, it references the respective rule like this: ed-DB-321.
For non-eurodata developers
Unfortunately, the eurodata engineering handbook is not available to third parties. Therefore, when starting to work on a PR for an open source project that uses this Mizool Playbook, please ask the eurodata developers if there are any rules that you should know about.
Applicability to legacy code¶
Legacy code often violates many of the rules given below. Unless working on a modernization effort with an explicit strategy for dealing with (or intentionally breaking) backwards compatibility, the rules only apply to newly created elements.
In situations where old and new code meet (e.g. adding a new column to an old table), applying current conventions could lead to inconsistencies with neighboring elements. If you encounter such a case, discuss it with other developers, then choose an approach to use and make a corresponding note on the surrounding element.
Naming conventions¶
General naming¶
- Use lower case for all identifiers in databases, and use
snake_casefor multi-word identifiers (see Capitalization and format, ed-DB-115 and ed-DB-116). - Avoid abbreviations and specifically making up your own abbreviations. For details, see General rules and ed-DB-118.
-
In general, names use English. However, German terms should be used when they refer to any of the following:
- business domain terms with company-specific meaning
- terms that are hard to translate as they are specific to German laws and institutions
- objects in other databases that the application interacts with
- arguments/fields in internal protocols (e.g. REST calls, asynchronous messages) or official standards
Examples
- The company-specific terms "Beraternummer" and "Mandantennummer" are not translated.
- "bank_statement" (or "bankstatement") is okay as the domain concept ("Kontoauszug") is not at all company specific and translating it does not cause loss of meaning.
- However, it is not acceptable if it refers to a specific table in another database that uses the German term.
Info
This rule is similar to, but less strict than ed-DB-112.
Tables and views¶
- Table names use singular (
book, notbooks). See ed-DB-117. - Do not add a general prefix or suffix to the name of views to distinguish them from tables (a practice suggested by ed-DB-119).
- A side effect of this rule is that you cannot have a table
foo_statusaccompanied by av_foo_statusview, which we think would be confusing. If a view has some specific purpose, include it in the name instead, e.g.foo_status_printable. This way, it (mostly) becomes an implementation detail whether a name refers to a view or a table.
- A side effect of this rule is that you cannot have a table
Columns¶
-
Primary key column names do not repeat the name of the table (see ed-DB-119).
Example
A
booktable could have a primary key column calledidorkey, but notbook_idorbook_key. -
Do not add prefixes and suffixes to the name of a column based on its type or usage.
- Specifically, do not add
fk_to the name of a column that is (part of) a foreign key constraint (a practice suggested by ed-DB-119).
- Specifically, do not add
- Names of columns that hold ID/key values generated by another system should always unambiguously identify that system.
- You may feel that it's obvious to everybody that
external_idorsource_idrefers to the company's "SplineReticulator Directory" product. But even if that were true at the time of writing, that does not mean that the same is true for future you or somebody else in charge of deciphering the code in 5 years. - Also, following this rule makes it easier to connect additional systems (each with their own column), or migrate from one to the other.
- As an alternative, a pair of columns can be used if the source system varies between rows.
For example,
source_idholds the ID itself andsource_systemspecifies the system that ID is from.
- You may feel that it's obvious to everybody that
Keys, indexes and constraints¶
The following rules are conscious departures from ed-DB-119.
Names of keys, indexes and constraints follow the structure given below, with segments separated by double underscores (__).
| Element | Name segments | Examples |
|---|---|---|
| Primary key | table namepk |
my_cool_table__pk |
| Foreign key constraint | owning tablefkeach included column of the owning table |
book__fk__author_idcomplaint__fk__order_id__order_line_pos |
| Unique index (full or partial) |
table nameutitle / each included column |
document__u__id__account_idsubscription__u__one_active_per_account_id |
| Other index (full or partial) |
table nameititle / each included column |
group__i__parent_idorders__i__unbilled_order_numbers |
| Check constraint | table namectitle |
account__c__name_lengthcontact__c__email_or_phone_set |
Name length
PostgreSQL's maximum name length is 63 characters, which should be sufficient for most elements, even foreign key constraints with multiple columns.
In case a name constructed according to the rules above ends up being too long, abbreviate or truncate some segments. However, double check that it is still obvious which column each name segment refers to, and that there is no ambiguity.
Sequences and triggers¶
-
Sequence names consist of the corresponding table name and the suffix
__seq(same as ed-DB-119 but with__instead of_).Identity columns
Before creating a sequence, double check whether an identity column could be used instead. See the respective rule in the Table and column declarations section below.
-
Trigger names consist of the table name, two underscores and a short but meaningful title. We intentionally do not add prefixes (like
trg_from ed-DB-119) or suffixes (e.g.bufor "before update" from ed-DB-119).Examples
book__insert,book__updateandbook__deletemy_cool_table__track_historypayment__trigger_processing
Table and column declarations¶
- Define primary keys, foreign key constraints, check constraints and indexes (unique or otherwise) explicitly, not as part of a column definition.
- Always specify the name of keys, constraints and indexes. Letting the RDBMS generate names would prevent developers from documenting their intention and often makes code hard to comprehend.
- Foreign key constraints
- Each relation between tables must be covered by a foreign key constraint (see ed-DB-107).
- Only specify the clauses that deviate from the default, e.g. include
on delete cascadebut don't addon update no action. - If the referenced column is the primary key, specify only the table name without a column name:
constraint book__fk__author_id foreign key (author_id) references author -- leave out `(id)` here on delete cascade
- Primary keys should always be a surrogate key, also known as a synthetic key (see ed-DB-108).
- Using natural/business keys (like email addresses, or an employee ID generated by another system) may feel more intuitive at first, but it often causes problems later on.
- Instead, include the natural/business key as a data column, possibly with an index. However, if the data originates from another system, be careful with unique indexes as you may not be able to trust that system to enforce uniqueness at all times.
- This rule applies even if (or especially if) you cannot think of a scenario where the "obvious" natural/business key value of a row would change.
-
In PostgreSQL, columns holding database-generated integer IDs should be declared
bigint not null generated by default as identity. Compared to the explicit creation of a sequence object, such identity columns are less prone to errors, and better document the author's intention.Note
We considered using the
generated alwaysclause instead ofgenerated by default. However, we decided to use the latter as the alternative would make it very cumbersome to keep existing ID values during special operations like database moves and migrations.Still, during regular operation, our applications and scripts should never set their own values for identity columns.
-
As columns are nullable by default, omit the
nullkeyword in column definitions.
Datatypes¶
Textual data¶
-
In PostgreSQL, whenever we use the
varchardatatype, we do not specify its maximum length. Such columns can store several hundreds of thousands of characters. Not having to define the length in advance makes our SQL code more maintainable.Implementing length restrictions
If you do need to enforce a length restriction on the database level, use a check constraint instead. This way, changing the length restriction later on will not cause the RDBMS to physically rewrite the entire table.
Non-standard behavior
In ANSI SQL, the maximum length of a
varcharmust be specified. Still, the PostgreSQL behavior is so useful that we accept this tradeoff in portability. -
Avoid using PostgreSQL's
textdatatype. While it offers the same advantage as an unboundedvarchar(see above), it would prevent us from ever using that column as part of an index.
Numeric data¶
-
In PostgreSQL, our default choice for storing numbers of any kind is to use
decimalwithout specifying a precision or scale. Such columns can store several thousands of digits both before and after the decimal point.1 Not having to define precision or scale in advance makes our SQL code more maintainable.Non-standard behavior
ANSI SQL specifies that leaving out the precision or scale is equivalent to
(0), i.e. integer values. Still, the PostgreSQL behavior is so useful that we accept this tradeoff in portability. -
Only when there is a good reason to have the database coerce or round input values to a certain precision or scale, use
decimal(p,s)with appropriate values. One such reason, obviously, is that the respective project does not use PostgreSQL. -
ANSI SQL specifies that
numericanddecimalare synonyms. To avoid any potential confusion, our code exclusively uses thedecimalname for this datatype.
Boolean values¶
-
In PostgreSQL and other RDBMS that support it or an equivalent datatype, we use
booleanfor all boolean-valued columns.- Always specify
booleancolumns asnot nullunlessnullis forced upon us by the source of the data (e.g. an upstream system). Allowingnullmakes it harder to use the column later on, as it ends up having three possible states instead of two, leaving it up to debate whethernullmeansfalseortrue. -
If you do need to allow
null, be sure to add a comment explaining the meaning ofnullvalues.Examples
- "null means there was no value present when importing."
- "If the user has not overridden the default value inherited from their group config, this column is null."
- Always specify
-
If
booleanis not supported, it can be emulated with thesmallintdatatype (or equivalent). To maintain integrity and document your intention, add a check constraint that limits the column values to0and1.
Syntax¶
- Do not quote identifiers unless it is required, for example because a column name is a reserved word (e.g.
timestamp). - Write SQL keywords, names of built-in objects and other predefined words in lower case.
- Type casts
- Prefer ANSI SQL syntax for casts over RDBMs specific syntax:
cast(my_value as text) -- good! my_value::text -- bad, avoid PostgreSQL syntax where possible. - Avoid scattering type casts all over the place, it makes the code harder to read (and may even lead to back-and-forth casting which is confusing).
Instead, try to centralize them using either of the following strategies:
- near the original columns (e.g. the lowest levels of a multi-level Join) when working with raw data not usable otherwise (e.g. when dates or numbers are imported and stored as strings)
- at the latest possible step (e.g. when required by a join)
- Prefer ANSI SQL syntax for casts over RDBMs specific syntax:
-
Avoid RDBMS specific syntax whenever possible and use ANSI SQL instead.
Examples
- Oracle: use
coalesce()instead ofnvl(). - PostgreSQL: to return a single row table, use
SELECT VALUES('A Test String')instead ofSELECT 'A Test String'.
- Oracle: use
-
See the paragraph on "unconstrained numeric" columns in the numeric datatypes chapter of the PostgreSQL docs. ↩