[RFC] Code changes to prevent crashes from lack of iceberg-type column in PyIceberg-produced SQL catalogs#2092
Conversation
|
So, after getting a good night's rest I've worked out that for most of these cases it's better and easier to just However, it's still unclear what should be done for Can there be a table and view with the same name, or are they mutually exclusive? If they're mutually exclusive, drop_table and rename_table are easily solvable.
|
|
@kevinjqliu What's your opinion on the idea of scanning either (1) the table schema using backend-specific SQL or (2) selecting * from the table and seeing if there's an I'm a little skeptical of approach 2 since there could theoretically be databases with defined schema but no tables yet. |
|
Added migration and schema versioning stuff in line with discussion in #2068 |
Which issue does this PR close?
Refer to #2068
SQL Catalogs from PyIceberg do not contain the
iceberg-typecolumn in theiceberg-tablestable. I did a brief test, and from what I can tell this issue doesn't affect Glue but does impact SQLite and MySQL. I haven't tested Postgres or any other backends yet.What changes are included in this PR?
This request for comment is written to discuss how we can re-write the relevant pieces of code to detect whether this column is present and react accordingly. This is a hard problem to tackle because there's no pure-SQL way to do this that I know of.
In the attached solution, I've added a variable to SqlCatalog which tracks what backend SQLX is using under the hood, and calls a backend-specific SQL query to get a list of columns. If
iceberg-typeis not one of the columns, then the check that uses it is omitted from the query.I want to know what your thoughts are on the implemented solution and if there's a better way to go about it. I know it's not ideal to have backend-specific code in the SQL catalog but I'm at a loss for how else to do this.
Are these changes tested?
Yes. I've tested this with an SQLite catalog exported from PyIceberg. I then imported the tables to a MySQL server to test the MySQL backend-specific code I added. These tests are not exhaustive but I wanted to get comments on this approach before going further.
It also passes the standard
make testchecks on my machine.PS--There's some code mixed in here from #2079 because I'm not so good at git yet and because the code is necessary to get the SQL backends working for testing.
Thanks!
Brodie