PostgreSQL statements
PostgreSQL comes with some cool features that allow you to enhance statements. Sometimes, developers start working in a rush to solve a specific problem, which often means writing a function, a routine, or a whole program just to inspect some data. The features provided by PostgreSQL alleviate the need to do this.
This article looks at some of these features (UPSERT and RETURNING), and offer hints and tricks so that you can gain as much as possible from ordinary statements. This will not only make any problems simpler to address, but will also improve your database experience.
The UPSERT statement
The UPSERT statement is a way of performing an UPDATE over an INSERT, allowing an INSERT statement to be automatically rewritten as an UPDATE statement under certain conditions. In particular, if the INSERT statement cannot be performed due to a conflict between data that already exists and data that is going to be inserted, the UPSERT statement enables you to override this conflict.
This technique is similar to what many Object Relational Mappers (ORMs) do when a generic persistence action is performed. The ORM examines the object it has to persist (the data), and if it finds out that an object has never been seen on the database, it executes an INSERT; otherwise, it performs an UPDATE. In this way, the client code (and therefore the developer) never has to consider which database actions have to be performed to persist the data. This is also the case when using the UPSERT statement.
It is important to note that in order for a database to decide if a regular INSERT statement can proceed, or if it has to be converted into an UPDATE, a conflict must exist—otherwise there is no chance of the database understanding if the data already exists. Typically, this conflict happens against a primary key or a unique constraint. This is, effectively, what many ORMs check: if an object already has a primary key, an UPDATE is required. Otherwise, a normal INSERT statement can be issued instead.
In order to better understand how an UPSERT statement works, insert the following data into the files table, as follows:
testdb=> INSERT INTO files( f_name, f_hash )VALUES ('chapter1.org', 'f029d04a81c322f158c608596951c105')
, ('chapter2.org', '14b8f225d4e6462022657d7285bb77ba');
Let's not say that another INSERT statement for the same file content has been issued. In this case, the database has to reject the incoming data because of the unique constraint defined over the f_hash column, as shown in the following listing:
testdb=> INSERT INTO files( f_name, f_hash )
VALUES('chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba');
ERROR: duplicate keyvalue violates uniqueconstraint "files_f_hash_key" DETAIL: Key(f_hash)=(14b8f225d4e6462022657d7285bb77ba) already exists.
In order to allow the database to accept the incoming data and perform an UPDATE to resolve the conflict, we need to specify the ON CONFLICT predicate, as shown in the following:
testdb=> INSERT INTO files( f_name, f_hash )
VALUES('chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba')ON CONFLICT ( f_hash ) DO UPDATESET f_name = EXCLUDED.f_name;
When the query shown in the preceding Listing 3 is executed, the system finds a conflict over the unique constraint for the f_hash column. This time, however, there is an ON CONFLICT resolution strategy.
The ON CONFLICT (f_hash) DO UPDATE predicate indicates that if a conflict arises from the f_hash column, the database should perform an UPDATE instead of INSERT. In the UPDATE part of the statement, the special alias EXCLUDED represents the tuple that has been rejected due to the conflict.
In other words, in the code of Listing 3, the database will either perform a regular INSERT statement if the data is not already present, or it will update the f_name of the conflicting tuple.
It is worth noting that the UPDATE predicate can also include a WHERE condition in order to better filter the update to be performed. Since the excluded tuple is aliased as EXCLUDED, the current tuple must be referenced with the table name, as we can see in the following listing. In this case, only tuples with the f_typeorg can be updated when a conflicting INSERT statement is issued. Other types of files, such as f_type, will simply fail if a unique constraint violation occurs:
testdb=> INSERT INTO files( f_name, f_hash )
VALUES('chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba')ON CONFLICT ( f_hash ) DO UPDATESET f_name = EXCLUDED.f_name WHERE files.f_type = 'org';
This is not the only strategy available, of course; it is also possible to simply do nothing at all. Note that doing nothing is different from the default behavior, which is failing on conflicts. When an INSERT statement is instructed to do nothing upon conflict, data is gracefully rejected, no error is reported, and our transaction will not abort. Everything works as if we had never executed the INSERT statement.
As shown in the following listing, all we need to do is to specify DO NOTHING within the ON CONFLICT part of the statement to ensure the conflicting tuples are gracefully thrown away:
testdb=> INSERT INTO files( f_name, f_hash )
VALUES ( 'chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba' ) ON CONFLICT ( f_hash ) DO NOTHING;
Getting back modified data with RETURNING
Each write-statement (INSERT, UPDATE, or DELETE) supports an optional RETURNING predicate that makes the statement return a result set with the manipulated tuples. From a conceptual point of view, it is as if these INSERT, UPDATE, and DELETE commands are immediately followed by an automatic SELECT statement.
This feature is very useful. It allows us to get back the exact values of the manipulated tuples, such as an automatically generated key, a computed timestamp, or other non-predictable values.
Let's take a look at the RETURNING function in action. Imagine that we need to insert some random data into the files table, as follows:
testdb=> INSERT INTO files( f_name, f_hash, f_size )
SELECT'file_' || v || '.txt', md5( v::text ), v * ( random() * 100)::intFROM generate_series(1, 10) v;
The preceding lines generate ten records with a random f_size and f_hash, as well as an auto-generated value for the pk primary key. We don't know the exact values that are going to hit, which are the values that are stored in the underlying table. RETURNING helps to solve this problem by providing us with the tuples inserted by the statement, as shown in the following listing:
testdb=> INSERT INTO files( f_name, f_hash, f_size )
SELECT 'File_' || v || '.txt',
md5( v::text || random() ), v * ( random() * 111 )::int
FROM generate_series( 1, 10 ) v
RETURNING pk, f_name, f_hash, f_size, ts;
pk | f_name | f_hash | f_size | ts
----+-------------+----------------------------------+----------+----------------------------
24 | File_1.txt | c09206052f182c8a01cd83ee0d4a7a78 | 21.0000 | 2018-10-31 09:37:29.036638
25 | File_2.txt | cec37633a67a66f99e4e427df5e40ee0 | 208.0000 | 2018-10-31 09:37:29.036638
26 | File_3.txt | afd08c4410e7600931bfcef8c3627cde | 267.0000 | 2018-10-31 09:37:29.036638
27 | File_4.txt | e8e56856ba183212b433151aeb3506cd | 384.0000 | 2018-10-31 09:37:29.036638
28 | File_5.txt | eab791d36b1fa25816d6715e628db02c | 235.0000 | 2018-10-31 09:37:29.036638
29 | File_6.txt | 552ed45e182088346cfd0503f2fef1f8 | 54.0000 | 2018-10-31 09:37:29.036638
30 | File_7.txt | 1a89ccc034a8d48b8bc92bf58d18e8bf | 679.0000 | 2018-10-31 09:37:29.036638
31 | File_8.txt | 0fe766ac50617ea7ff6d1cfb3e8060d2 | 400.0000 | 2018-10-31 09:37:29.036638
32 | File_9.txt | 063a175cf2b498dab6bf93fb8f76427a | 648.0000 | 2018-10-31 09:37:29.036638
33 | File_10.txt | 42c450d54f4fe7e29b245a3d50258f4d | 770.0000 | 2018-10-31 09:37:29.036638
As we can see in the preceding snippet, the RETURNING function accepts the same column list as a regular SELECT command, including the special * symbol, which means the statement will return all available columns. The following listing illustrates how to use the * symbol to get back all the columns of the deleted tuples:
testdb=> DELETEFROM files RETURNING *;
pk | f_name | f_hash |f_type|f_size| ts ----+-------------+----------------------------------+------+------+---------------------------1 | file_1.txt | c4ca4238a0b923820dcc509a6f75849b | | 54 | 2018-06-1819:49:52.591672 | file_2.txt | c81e728d9d4c2f636f067f89cc14862c | | 78 | 2018-06-1819:49:52.591673 | file_3.txt | eccbc87e4b5ce2fe28308fd9f2a7baf3 | | 153 | 2018-06-1819:49:52.591674 | file_4.txt | a87ff679a2f3e71d9181a67b7542122c | | 280 | 2018-06-1819:49:52.591675 | file_5.txt | e4da3b7fbbce2345d7772b0674a318d5 | | 160 | 2018-06-1819:49:52.591676 | file_6.txt | 1679091c5a880faf6fb5e6087eb1b2dc | | 234 | 2018-06-1819:49:52.591677 | file_7.txt | 8f14e45fceea167a5a36dedd4bea2543 | | 420 | 2018-06-1819:49:52.591678 | file_8.txt | c9f0f895fb98ab9159f51fd0297e236d | | 232 | 2018-06-1819:49:52.591679 | file_9.txt | 45c48cce2e2d7fbdea1afc51c7c6ad26 | | 396 | 2018-06-1819:49:52.5916710 | file_10.txt | d3d9446802a44259755d38e6d163e820 | | 280 | 2018-06-1819:49:52.59167
Note that the result provided by the RETURNING function represents the final state of the manipulated tuples. This means that the result set is built after all of the triggers, if any, have been fired.
The result set provided by a RETURNING predicate is also available from external client applications. For example, as shown in the following listing, it is possible to iterate over results from a Java client:
class returning{publicstaticvoidmain(Stringargv[])throwsException{
Class.forName("org.postgresql.Driver");
StringconnectionURL = "jdbc:postgresql://localhost/testdb";
PropertiesconnectionProperties = newProperties();
connectionProperties.put("user", "luca");
connectionProperties.put("password", "secret");
Connectionconn = DriverManager.getConnection( connectionURL,
connectionProperties );
Stringquery = "INSERT INTO files( f_name, f_hash, f_size ) "
+ " SELECT 'file_' || v || '.txt',"
+ " md5( v::text ),"
+ " v * ( random() * 100 )::int"
+ " FROM generate_series(1, 10 ) v "
+ " RETURNING pk, f_name, f_hash, f_size, ts;";
Statementstatement = conn.createStatement();
ResultSetresultSet = statement.executeQuery( query );
while( resultSet.next())
System.out.println( String.format("pk = %d, size = %d,
hash = %s", resultSet.getLong("pk"), resultSet.getInt(
"f_size"), resultSet.getString("f_hash")));
resultSet.close();
statement.close();
}}
When the Java code is executed, the output will look as follows:
pk = 11, size = 22, hash = c4ca4238a0b923820dcc509a6f75849b
pk = 12, size = 90, hash = c81e728d9d4c2f636f067f89cc14862c
pk = 13, size = 225, hash = eccbc87e4b5ce2fe28308fd9f2a7baf3
pk = 14, size = 368, hash = a87ff679a2f3e71d9181a67b7542122c
pk = 15, size = 460, hash = e4da3b7fbbce2345d7772b0674a318d5
pk = 16, size = 330, hash = 1679091c5a880faf6fb5e6087eb1b2dc
pk = 17, size = 140, hash = 8f14e45fceea167a5a36dedd4bea2543
pk = 18, size = 544, hash = c9f0f895fb98ab9159f51fd0297e236d
pk = 19, size = 801, hash = 45c48cce2e2d7fbdea1afc51c7c6ad26
pk = 20, size = 980, hash = d3d9446802a44259755d38e6d163e820
If you found this article helpful, you can check out PostgreSQL 11 Server Side Programming Quick Start Guide to implement effective database programming and interaction. Following a hands-on approach, PostgreSQL 11 Server Side Programming Quick Start Guide is for database administrators, data engineers, and database engineers who want to implement advanced functionalities and master complex administrative tasks with PostgreSQL 11.
Materiale Didattico
- FULL TEXT SEARCH
- Foreign data wrapper
- Indici
- PostgreSQL 10 - High perfomance
- Intervento su SQL e PostgreSQL all'ESC 2k16
- Oracle to Postgres Migration - part 2
- Oracle to Postgres Migration
- PostgreSQL: primi passi con BDR (OpenDay PGTraining, Rimini 27 Feb 2016)
- PostgreSQL: primi passi con Json/Jsonb - (OpenDay PGTraining, Rimini 27 Feb 2016)
- LUGBZ BabsCamp 2013