Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Client-side SQL query formatting

Client-side SQL formatting allows running SQL queries with user-supplied parameters securely. It can be used as a simpler and more flexible alternative to prepared statements. While prepared statements expand queries server-side, SQL formatting does it client-side. Please read the comparison with prepared statements and the security considerations sections for more info.

Simple queries

with_params takes a SQL query string with placeholders and a set of parameters. When passed to execute or async_execute, the query is expanded in the client with the supplied parameters and sent to the server for execution:

std::string employee_name = get_name();  // employee_name is an untrusted string
results result;

// Expand the query and execute it. The expansion happens client-side.
// If employee_name is "John", the executed query would be:
// "SELECT id, salary FROM employee WHERE last_name = 'John'"
conn.execute(
    with_params("SELECT id, salary FROM employee WHERE last_name = {}", employee_name),
    result
);

Curly braces ({}) represent placeholders (technically called replacement fields). The notation and semantics are similar to std::format.

All fundamental types can be used as query parameters. This includes integers, floating point types, strings, blobs, dates and times:

// Will execute "SELECT id FROM employee WHERE salary > 42000"
results result;
conn.execute(with_params("SELECT id FROM employee WHERE salary > {}", 42000), result);

std::optional<T> and boost::optional<T> can also be used:

std::optional<std::int64_t> salary;  // get salary from a possibly untrusted source
results result;

// Depending on whether salary has a value or not, executes:
// "UPDATE employee SET salary = 42000 WHERE id = 1"
// "UPDATE employee SET salary = NULL WHERE id = 1"
conn.execute(with_params("UPDATE employee SET salary = {} WHERE id = {}", salary, 1), result);

Collections and ranges are supported, as long as its elements can be formatted:

results result;
std::vector<long> ids{1, 5, 20};

// Executes "SELECT * FROM employee WHERE id IN (1, 5, 20)"
conn.execute(with_params("SELECT * FROM employee WHERE id IN ({})", ids), result);

See this section for more on formatting ranges, and this table for a reference of types that have built-in support for SQL formatting.

[Note] Note

Like with std::format, the query string passed to with_params must be known at compile-time. You can skip this check using the runtime function.

Like std::format, you can use arguments with explicit indices:

// Recall that you need to set connect_params::multi_queries to true when connecting
// before running semicolon-separated queries. Executes:
// "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
results result;
conn.execute(
    with_params(
        "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}",
        42,
        "John"
    ),
    result
);

See this section for a reference on the format string syntax.

Common errors and how to fix them

Not all values can be formatted. If the library finds that formatting a certain value can cause an ambiguity that could lead to a security problem, an error will be issued and the query won't be sent to the server. Here are the most common errors:

For example:

try
{
    // If the connection is using UTF-8 (the default), this will throw an error,
    // because the string to be formatted is not valid UTF-8.
    // The query never reaches the server.
    results result;
    conn.execute(with_params("SELECT {}", "bad\xff UTF-8"), result);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::invalid_encoding);
}

Formatting queries without executing them

with_params is handy, but may fall short in some cases involving queries with complex logic. For these cases, you can use format_sql and format_sql_to to expand a query without executing it. These APIs don't involve communication with the server.

format_sql is the simplest, and is akin to std::format:

// Compose the SQL query without executing it.
// format_opts returns a system::result<format_options>,
// contains settings like the current character set.
// If the connection is using an unknown character set, this will throw an error.
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, salary FROM employee WHERE last_name = {}",
    "Doe"
);

BOOST_TEST(query == "SELECT id, salary FROM employee WHERE last_name = 'Doe'");

format_sql requires a format_options instance describing connection configuration, like the character set currently in use. any_connection::format_opts provides an easy way to retrieve these. This section contains more info about format_opts.

Some use cases, usually involving conditionals, may not be expressible in terms of a single format string. In such cases, you can use format_context and format_sql_to to build query strings incrementally:

// Compose a query that retrieves all employees in a company,
// with an optional limit
std::string compose_select_query(
    boost::mysql::format_options opts,
    string_view company_id,
    std::optional<long> limit
)
{
    // format_context will accumulate the query as we compose it
    boost::mysql::format_context ctx(opts);

    // format_sql_to expands a format string and appends the result
    // to a format context. This way, we can build our query in smaller pieces
    // Add all the query except for the LIMIT clause
    boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE company_id = {}", company_id);

    if (limit)
    {
        // Add the LIMIT clause
        boost::mysql::format_sql_to(ctx, " LIMIT {}", *limit);
    }

    // Retrieve the generated query string.
    // get() returns a boost::system::result<std::string> that
    // contains an error if any of the format operations failed.
    // Calling value() will throw on error, like format_sql does
    return std::move(ctx).get().value();
}
std::string query = compose_select_query(conn.format_opts().value(), "HGS", {});
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS'");

query = compose_select_query(conn.format_opts().value(), "HGS", 50);
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS' LIMIT 50");

sequence uses this feature to make formatting ranges easier.

Any type that works with with_params also does with format_sql and format_sql_to. These types are said to satisfy the Formattable concept. This table summarizes such types.

Formatting ranges with sequence

The sequence function can be used when the default range formatting isn't sufficient. If the elements in your range are not formattable, you can pass a user-defined function to sequence describing how to format each element:

// Employee is a plain struct, not formattable by default
std::vector<employee> employees{
    {"John", "Doe",   "HGS"},
    {"Kate", "Smith", "AWC"},
};
std::string query = format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES {}",
    sequence(
        employees,
        [](const employee& e, format_context_base& ctx) {
            // This function will be called for each element in employees,
            // and should format a single element into the passed ctx.
            // Commas will be inserted separating elements.
            format_sql_to(ctx, "({}, {}, {})", e.first_name, e.last_name, e.company_id);
        }
    )
);
BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Kate', 'Smith', 'AWC')"
);

By default, elements are separated by commas, but this is configurable:

// A collection of filters to apply to a query
std::vector<std::pair<string_view, string_view>> filters{
    {"company_id", "HGS" },
    {"first_name", "John"},
};

std::string query = format_sql(
    conn.format_opts().value(),
    "SELECT * FROM employee WHERE {}",
    sequence(
        filters,
        [](const std::pair<string_view, string_view>& f, format_context_base& ctx) {
            // Compose a single filter
            format_sql_to(ctx, "{:i} = {}", f.first, f.second);
        },
        " AND "  // glue string: separate each element with AND clauses
    )
);

BOOST_TEST(query == "SELECT * FROM employee WHERE `company_id` = 'HGS' AND `first_name` = 'John'");

You can use sequence and with_params together.

By default, sequence copies the range you pass as parameter, making it safer for async code. You can use std::reference_wrapper or std::span to avoid such copies.

Format specifiers

Some types, like strings, can be formatted in multiple ways. As with std::format, you can select how to format them using format specifiers.

As we've seen, strings are formatted as single-quoted values by default. If you use the {:i} specifier, you can obtain dynamic SQL identifiers, instead:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {:i} DESC",
    "company_id"
);

BOOST_TEST(query == "SELECT id, last_name FROM employee ORDER BY `company_id` DESC");

Specifiers are compatible with explicit indices and named arguments, too. This is equivalent to the previous snippet:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {0:i} DESC",
    "company_id"
);

Prepared statements vs. client-side SQL formatting

Although both serve a similar purpose, they are fundamentally different. Prepared statements are parsed and expanded by the server. Client-side SQL expands the query in the client and sends it to the server as a string.

This means that client-side SQL does not understand your queries. It just knows about how to format MySQL types into a string without creating vulnerabilities, but otherwise treats your queries as opaque strings. Client-side SQL yields greater flexibility (you can dynamically compose any query), while statements have more limitations. This also means that you need to pay more attention to compose valid queries, specially when dealing with complex conditionals. Logic errors may lead to exploits. Please read the security considerations section for more info.

Client-side SQL entails less round-trips to the server than statements, and is usually more efficient for lightweight queries. However, it uses the less compact text protocol, which may be slower for queries retrieving a lot of data. See the efficiency considerations section for more info.

In general, use client-side SQL formatting for the following cases:

On the other hand, prefer prepared statements if:

Efficiency considerations

Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:

Security considerations

Both client-side SQL formatting and prepared statements protect against SQL injection. Statements do so by parsing the query with placeholders server-side, before performing parameter substitution. Client-side SQL quotes and escapes your values to avoid injection, but does not understand your queries.

This means that you need to ensure that your queries always expand to valid SQL. This is trivial for simple queries, but may be an issue with more complex ones, involving ranges or dynamic identifiers. For instance, the following query may expand to invalid SQL if the provided range is empty:

// If ids.empty(), generates "SELECT * FROM employee WHERE id IN ()", which is a syntax error.
// This is not a security issue for this query, but may be exploitable in more involved scenarios.
// Queries involving only scalar values (as opposed to ranges) are not affected by this.
// It is your responsibility to check for conditions like ids.empty(), as client-side SQL
// formatting does not understand your queries.
std::vector<int> ids;
auto q = format_sql(conn.format_opts().value(), "SELECT * FROM employee WHERE id IN ({})", ids);

The risk is higher if you're building your query by pieces using format_sql_to.

To sum up:

Types with built-in support for SQL formatting

C++ type

Formatted as...

Example

signed char, short, int, long, long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42) == "SELECT 42"
format_sql(opts, "SELECT {}", -1) == "SELECT -1"

unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42u) == "SELECT 42"

bool

Integral literal 1 if true, 0 if false
No format specifiers allowed

format_sql(opts, "SELECT {}", false) == "SELECT 0"
format_sql(opts, "SELECT {}", true) == "SELECT 1"

String types (convertible to string_view), including:

std::string

string_view

std::string_view

const char*

Without format specifiers: single-quoted escaped string literal. Note that LIKE special characters (% and _) are not escaped.

i format specifier: backtick-quoted, escaped SQL identifier.

r format specifier: raw, unescaped SQL. Warning: use this specifier with caution.

// Without format specifier: escaped, quoted string value
format_sql(opts, "SELECT {}", "Hello world") == "SELECT 'Hello world'"
format_sql(opts, "SELECT {}", "Hello 'world'") == R"(SELECT 'Hello \'world\'')"

// {:i}: escaped, quoted dynamic identifier
format_sql(opts, "SELECT {:i} FROM t", "salary") == "SELECT `salary` FROM t"
format_sql(opts, "SELECT {:i} FROM t", "sal`ary") == "SELECT `sal``ary` FROM t"

// {:r}: raw, unescaped SQL. WARNING: incorrect use can cause vulnerabilities
format_sql(opts, "SELECT * FROM t WHERE id = 42 {:r} salary > 20000", "OR") ==
    "SELECT * FROM t WHERE id = 42 OR salary > 20000"

Blob types (convertible to span<const unsigned char>), including:

blob (std::vector<unsigned char>)

blob_view (span<const unsigned char>)

std::array<unsigned char, N>

Hex string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", blob{0x00, 0x48, 0xff}) == R"(SELECT x'0048ff')"

float, except NaN and inf

Floating-point literal, after casting to double.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

// Equivalent to format_sql(opts, "SELECT {}", static_cast<double>(4.2f))
// Note that MySQL uses doubles for all floating point literals
format_sql(opts, "SELECT {}", 4.2f) == "SELECT 4.199999809265137e+00"

double, except NaN and inf

Floating-point literal.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

format_sql(opts, "SELECT {}", 4.2) == "SELECT 4.2e+00"

date

Single quoted, DATE-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", date(2021, 1, 2)) == "SELECT '2021-01-02'"

datetime

Single quoted DATETIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", datetime(2021, 1, 2, 23, 51, 14)) == "SELECT '2021-01-02 23:51:14.000000'"

time and std::chrono::duration types convertible to time

Single quoted TIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", std::chrono::seconds(121)) == "SELECT '00:02:01.000000'"

std::nullptr_t

NULL
No format specifiers allowed

format_sql(opts, "SELECT {}", nullptr) == "SELECT NULL"

boost::optional<T> and std::optional<T>, T being one of the fundamental types above.
Not applicable to custom types or ranges.
No format specifiers allowed

Formats the underlying value if there is any.
NULL otherwise.

format_sql(opts, "SELECT {}", optional<int>(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", optional<int>()) == "SELECT NULL"

field and field_view

Formats the underlying value.
No format specifiers allowed

format_sql(opts, "SELECT {}", field(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", field("abc")) == "SELECT 'abc'"
format_sql(opts, "SELECT {}", field()) == "SELECT NULL"

Range of formattable elements. Informally, such ranges support std::begin() and std::end(), and its iterator operator* must yield one of the following:

  • A WritableField (i.e. one of the fundamental types above).
  • A type with a custom formatter.

Ranges of ranges are not supported. Note that vector<unsigned char> and similar types are formatted as blobs, not as sequences.

See the Formattable concept reference for a formal definition.

Formats each element in the range, separating elements with commas.
Specifiers can be applied to individual elements by prefixing them with a colon (:)

// long is a WritableField
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20}) == "SELECT 1, 5, 20"

// C++20 ranges and other custom ranges accepted
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20} | std::ranges::views::take(2)) ==
"SELECT 1, 5"

// Apply the 'i' specifier to each element in the sequence
format_sql(
    opts,
    "SELECT {::i} FROM employee",
    std::vector<string_view>{"first_name", "last_name"}
) == "SELECT `first_name`, `last_name` FROM employee"

format_sequence (as returned by sequence)

Formats each element in a range by calling a user-supplied function, separating elements by a glue string (a comma by default).
No format specifiers allowed

format_sql(
    opts,
    "SELECT {}",
    sequence(
        std::vector<int>{1, 5, 20},
        [](int val, format_context_base& ctx) { format_sql_to(ctx, "{}+1", val); }
    )
) == "SELECT 1+1, 5+1, 20+1"

Custom type that specializes formatter

Calls formatter::parse and formatter::format
May accept user-defined format specifiers.

formattable_ref

Formats the underlying value. Can represent any of the types above.
Accepts the same format specifiers as the underlying type.

format_sql(opts, "SELECT {}", formattable_ref(42)) == "SELECT 42"
format_sql(opts, "SELECT {:i} FROM t", formattable_ref("salary")) == "SELECT `salary` FROM t"

PrevUpHomeNext