Best way to get parquet data from AWS S3 bucket?

SeanPT
SeanPT Contributor
edited October 2022 in Connectors

We have some parquet files being replicated to an AWS S3 bucket.

I've started to look to see if I can use Amazon Glue to crawl the bucket, Athena to query the Glue table, and then Domo to pull data from Athena. I'm running into a few issues (like the initial load file Glue picks up as a table, says it has rows, but Athena can't query any data from it) but I think I can get there.

However, before I go too far down the road, is there another approach that works?

Unfortunately the S3 connector doesn't read parquet files.

I could convert them to CSV and upload directly to Domo using something like https://stackoverflow.com/questions/62275672/converting-parquet-files-in-s3-to-csv-and-store-back-in-s3 but that seems ... cludgy?

If anyone has any suggestions, I'm all ears.

Answers