I have a AWS Crawler which I am switching the s3 target path in order to switch the underlying table source. The problem is that the tables are being created from both targets:
configuration:
aws glue get-crawler --name sand-main
{
"Crawler": {
"Name": "sand-main",
"Role": "Crawler-sand",
"Targets": {
"S3Targets": [
{
"Path": "s3://sand-main-green/main",
"Exclusions": [
"checkpoints/**",
"IsActive.txt",
"isactive.txt"
]
}
],
"JdbcTargets": [],
"MongoDBTargets": [],
"DynamoDBTargets": [],
"CatalogTargets": []
},
"DatabaseName": "sand_main",
"Description": "",
"Classifiers": [],
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "DELETE_FROM_DATABASE"
},
"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},
"State": "READY",
"CrawlElapsedTime": 0,
"CreationTime": "2020-09-30T14:07:25-06:00",
"LastUpdated": "2021-01-28T11:32:15-07:00",
"LastCrawl": {
"Status": "SUCCEEDED",
"LogGroup": "/aws-glue/crawlers",
"LogStream": "sand-main",
"MessagePrefix": "5bb1907d-2847-46ef-8712-3a50deb2b7a0",
"StartTime": "2021-01-28T11:32:35-07:00"
},
"Version": 24,
"Configuration": "{"Version":1.0,"CrawlerOutput":{"Partitions":{"AddOrUpdateBehavior":"InheritFromTable"}},"Grouping":{"TableGroupingPolicy":"CombineCompatibleSchemas"}}"
}
}
The path I have a lambda that will switch from:
"Path": "s3://sand-main-green/main"
To:
"Path": "s3://sand-main-blue/main"
But I end up with tables:
Name -> Location
test -> s3://sand-main-blue/main/test
test_2398l50df -> s3://sand-main-green/main/test
I have DELETE_IN_DATABASE
so I would expect the old s3 paths to be deleted. It feels like the crawler retains the history of its s3 targets. I do not want this behavior
question from:
https://stackoverflow.com/questions/65943326/aws-crawler-s3-target-path-changes-but-old-path-tables-included 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…