I have a Pandas dataframe that I'm writing out to Snowflake using SQLAlchemy engine and the to_sql
function. It works fine, but I have to use the chunksize
option because of some Snowflake limit. This is also fine for smaller dataframes. However, some dataframes are 500k+ rows, and at a 15k records per chunk, it takes forever to complete writing to Snowflake.
I did some research and came across the pd_writer
method provided by Snowflake, which apparently loads the dataframe much faster. My Python script does complete faster and I see it creates a table with all the right columns and the right row count, but every single column's value in every single row is NULL.
I thought it was a NaN
to NULL
issue and tried everything possible to replace the NaN
s with None
, and while it does the replacement within the dataframe, by the time it gets to the table, everything becomes NULL.
How can I use pd_writer
to get these huge dataframes written properly into Snowflake? Are there any viable alternatives?
EDIT: Following Chris' answer, I decided to try with the official example. Here's my code and the result set:
import os
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
from snowflake.connector.pandas_tools import write_pandas, pd_writer
def create_db_engine(db_name, schema_name):
return create_engine(
URL(
account=os.environ.get("DB_ACCOUNT"),
user=os.environ.get("DB_USERNAME"),
password=os.environ.get("DB_PASSWORD"),
database=db_name,
schema=schema_name,
warehouse=os.environ.get("DB_WAREHOUSE"),
role=os.environ.get("DB_ROLE"),
)
)
def create_table(out_df, table_name, idx=False):
engine = create_db_engine("dummy_db", "dummy_schema")
connection = engine.connect()
try:
out_df.to_sql(
table_name, connection, if_exists="append", index=idx, method=pd_writer
)
except ConnectionError:
print("Unable to connect to database!")
finally:
connection.close()
engine.dispose()
return True
df = pd.DataFrame([("Mark", 10), ("Luke", 20)], columns=["name", "balance"])
print(df.head)
create_table(df, "dummy_demo_table")
The code works fine with no hitches, but when I look at the table, which gets created, it's all NULLs. Again.
See Question&Answers more detail:
os