Creating views programmatically in Athena is not documented, and unsupported, but possible. What happens behind the scenes when you create a view using StartQueryExecution
is that Athena lets Presto create the view and then extracts Presto's internal representation and puts it in the Glue catalog.
The staleness problem usually comes from the columns in the Presto metadata and the Glue metadata being out of sync. An Athena view really contains three descriptions of the view: the view SQL, the columns and their types in Glue format, and the columns and types in Presto format. If either of these get out of sync you will get the "… is stale; it must be re-created." error.
These are the requirements on a Glue table to work as an Athena view:
TableType
must be VIRTUAL_VIEW
Parameters
must contain presto_view: true
TableInput.ViewOriginalText
must contain an encoded Presto view (see below)
StorageDescriptor.SerdeInfo
must be an empty map
StorageDescriptor.Columns
must contain all the columns that the view defines, with their types
The tricky part is the encoded Presto view. That structure is created by this code: https://github.com/prestosql/presto/blob/27a1b0e304be841055b461e2c00490dae4e30a4e/presto-hive/src/main/java/io/prestosql/plugin/hive/HiveUtil.java#L597-L600, and this is more or less what it does:
- Adds a prefix
/* Presto View:
(with a space after :
)
- Adds a base 64 encoded JSON string that contains the view SQL, the columns and their types, and some catalog metadata (see below)
- Adds a suffix
*/
(with a space before *
)
The JSON that describes the view looks like this:
- A
catalog
property that must have the value awsdatacatalog
.
- A
schema
property that must be the name of the database where the view is created (i.e. it must match the DatabaseName
property of the surrounding Glue structure.
- A list of columns, each with a
name
and type
- A
originalSql
property with the actual view SQL (not including CREATE VIEW …
, it should start with SELECT …
or WITH …
)
Here's an example:
{
"catalog": "awsdatacatalog",
"schema": "some_database",
"columns": [
{"name": "col1", "type": "varchar"},
{"name": "col2", "type": "bigint"}
],
"originalSql": "SELECT col1, col2 FROM some_other_table"
}
One caveat here is that the types of the columns are almost, but not quite, the same as the names in Glue. If Athena/Glue would have string
the value in this JSON must be varchar
. If the Athena/Glue uses array<string>
the value in this JSON must be array(varchar)
, and struct<foo:int>
becomes row(foo int)
.
This is pretty messy, and putting it all together requires some fiddling and testing. The easiest way to get it working is to create a few views and decoding working the instructions above backwards to see how they look, and then try doing it yourself.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…