Dealing with column types & bad data
Here's the problem I've come across a few times now:
A CSV upload file has a column which looks like integers for the first hundred or so rows, but lurking further down are alpha characters. When creating a transform in magic, it loads the first 100 rows, sees they're all integers and sets the column type as such. Then, run the ETL and it fails with "Failed to parse data 'blah' as type Integer for column..."
Adding a 'set column type' tile to set it to text (and then strip non-integers) doesn't help as the error is triggered when the whole file is loaded into the ETL before it can process the column type change.
The only solution I've found is to strip out the bad data manually before uploading the file. Resolving at the datasource would be ideal, but not always possible. Has anyone found a way to deal with this without manual intevention so it just works?
Thanks.
Best Answer
-
Damien,
One question on your situation, are you looking to remove the fields with non-numeric data completely (ie, make them NULL) or are you wanting to just strip out the specific offending characters or perhaps omit the row entirely if it has bad data?
If you'll let me know how you're wanting to handle the bad data, I'll try to be more clear in my next response, but maybe the following will help.
Regardless I think you're going to need to use the SQL transform to clean your data.
When I'm looking for strictly numeric fields (0-9 only), you can do something like this:
SELECT * FROM Table WHERE `column` REGEXP '^[0-9]+$'
That would return only records where the column has numbers. If you have commas or periods in your data as well use:
SELECT * FROM Table WHERE `column` REGEXP '^[0-9]+\\.?[0-9]*$'
Again, hopefully thats somewhat helpful. If you need more assistance, just let me know a bit more about how you're wanting to handle the fields causing you problems.
Sincerely,
-ValiantSpur
0
Answers
-
Damien,
One question on your situation, are you looking to remove the fields with non-numeric data completely (ie, make them NULL) or are you wanting to just strip out the specific offending characters or perhaps omit the row entirely if it has bad data?
If you'll let me know how you're wanting to handle the bad data, I'll try to be more clear in my next response, but maybe the following will help.
Regardless I think you're going to need to use the SQL transform to clean your data.
When I'm looking for strictly numeric fields (0-9 only), you can do something like this:
SELECT * FROM Table WHERE `column` REGEXP '^[0-9]+$'
That would return only records where the column has numbers. If you have commas or periods in your data as well use:
SELECT * FROM Table WHERE `column` REGEXP '^[0-9]+\\.?[0-9]*$'
Again, hopefully thats somewhat helpful. If you need more assistance, just let me know a bit more about how you're wanting to handle the fields causing you problems.
Sincerely,
-ValiantSpur
0 -
Thanks for the response. I was hoping it would be possible in Magic but happy with SQL.
In this case I just want to make the non-numeric data null.
0
Categories
- All Categories
- 1.7K Product Ideas
- 1.7K Ideas Exchange
- 1.5K Connect
- 1.2K Connectors
- 292 Workbench
- 4 Cloud Amplifier
- 8 Federated
- 2.8K Transform
- 95 SQL DataFlows
- 600 Datasets
- 2.1K Magic ETL
- 3.7K Visualize
- 2.4K Charting
- 685 Beast Mode
- 43 App Studio
- 38 Variables
- 655 Automate
- 170 Apps
- 438 APIs & Domo Developer
- 42 Workflows
- 5 DomoAI
- 32 Predict
- 12 Jupyter Workspaces
- 20 R & Python Tiles
- 383 Distribute
- 110 Domo Everywhere
- 267 Scheduled Reports
- 6 Software Integrations
- 111 Manage
- 108 Governance & Security
- 8 Domo University
- 25 Product Releases
- Community Forums
- 39 Getting Started
- 29 Community Member Introductions
- 98 Community Announcements
- Domo Community Gallery
- 4.8K Archive