Dataflow creation best practice and recommendation

Golfoholic · August 2021

A lot of the dataflow creation I do is taking 2 datasets that differ slightly and don't have an easy 1:1 comparison that makes joining the data very easy. usually I am using 1 report for almost the entire visualization, and puling a column or two form the 2nd report. Since the reports aren't in an identical format, and illustrating different things I am unsure what accuracy issue I could be running in to and if there is a better way to join them together.

For Example, attached are 2 reports. 1. PL (PriceList) 2. WD (Web Discount)

The PL is the bulk of what I will utilize for the final visualization, but the 'Web Discount' column in the WD is important and the only column I am trying to pull in to the final dataset.

Both dataset have similar columns such as unit type & unit size (unit size first takes using replace text tile, and \s to get in same format as PL). the siteID and Lcode are different and but represent the same thing, so I first use a separate dataset (webform) that has all of the different SiteId's that are used throughout the organization to join with the the PL and WD to give both dataset the same identifier columns.

One of those columns from the webform with the siteid's is a site name i.e. "Huntington Beach" I take 3 columns from both datasets site name, Unit type, and unit size then use the combine columns tile to make an identical single column in both the PL and WD reports that i then use as a "key' to join on.

This seems to do the job, but I don't know enough to know if I am missing or not taking something in to account. Any recommendations or insights are appreciated.

Price List = Sample.xlsx

Web Discount - Sample.xlsx

MarkSnodgrass · August 2021

You might consider appending rather than joining. Choose the Include All Columns option when appending so that you don't lose any columns and then you have a single dataset that you can build all your cards off of.

Golfoholic · August 2021

@MarkSnodgrass Ok I will have to give that a try. My next question for this approach would be how does the append approach handle duplicated rows? Most rows will have a duplicate in the both datasets. Will it show the duplicates on separate rows, or will it recognize the duplication and eliminate one while also keeping the other columns that are unique to one of the datasets like 'Web Discount' column?

Dataflow creation best practice and recommendation

Answers

Categories