Multi-Attribute Stratification

ckatzman Contributor

Not sure if this becomes a suggestion for a future card capability, or a request in terms of is this possible today*?, but let me put this out for consideration...


I have a book of business... thousands of transactions, each transaction has hundreds of data points.  But ten of those data points are key, and i want to know the distribution of the overall book of business across each of those data points.  I could build ten cards, one for each attribute, and show the relative distribution within that attribute.  


Example:  attribute=color; values:  red, blue, green, yellow;  results:  red=35%, blue=30%, green=25%, yellow=10%.


What I'd really like to do is have something like a table card or a Sumo card type of layout, where i can add multiple attributes as series, and stack them on top of each other.  Each attribute is a series and represents 100% of the total population, but the view is stratified such that i have a stack of attributes, and each one is split by the values of that category.  That's the vertical view...


On the horizontal, i could do different things... perhaps multiple different KPIs (unit count, total value, avg cost, etc.)  Or perhaps a time-series so I can see how the mix within an attribute changes over time.


The attached screen shot shows an example with a stack of four attributes, each reflecting the potential subordinate values within that attribute and the relative distribution, across a 6-month time series.  This example happens to be at a customer level, so several of the categories show 100% in a single row rather than having greater definition as it would across a larger selection.




*I realize I could probably do this with a series of Dataflows, heavily scripted MySQL, and then bringing them together in a final dataset in a particular order, but that lacks efficiency when it needs to be repeatable in different configurations.  Being able to build it as a card with multiple value and series drag & drop capability would be a great feature.

**Say thank you by clicking the 'thumbs up'
**Be sure to select the answer that represents the best solution and mark as "Accept as Solution"


  • kshah008
    kshah008 Contributor

    Hi all,


    Can anybody help @ckatzman with their question?


  • Currently, there are not options at the card level to create the type of configuration you are looking for. On a table card, each record is used once, bucketed into a particular group based on the categories that have been selected. It appears that to get the type of output you're looking for, the dataset would need to include aggregations of the same record for multiple attributes. I think the best way to accomplish this at this time is to do as you suggested and create the metrics in a dataflow, unioning the records for the different attributes together into a final dataset that you can display on a card. Without unioning the same dataset with itself, I don't think you'll be able to use the same record in multiple attributes.


    You can also add this as a suggestion in the Ideas Exchange here. The product team watches these ideas closely and is always interested to hear about new use cases and product enhancements.

  • kshah008
    kshah008 Contributor

    @ckatzman, as zcameron suggested, please feel free to add your feature request to our Ideas section!

  • ckatzman
    ckatzman Contributor

    Been a while on this topic, had to move on to other things, but have had occasion to circle back and work on this.  I now have a dataflow created with Magic ETL...




    Essentially, each segment is looking at the entire population, but based on a different attribute, so that I can create a stratified view of the overall poopulation.  In the data set, instead of each attribute being represented in its own field (column), the union is forcing a representative value for each attribute into a pair of fields so that it stacks up neatly in a table card and looks like this...




    Nearly exactly what I'm looking for.  Each boxed section represents 100% of the population; while each transactional record only exists once in the base source dataset, within the dataflow-dataset to feed this card, each transactional record exists as many times as there are subsets being joined in.  In other words, if i have 100 records in my source data, and 10 attributes over which I want to create a stratified view, then my dataflow/card has 1000 total records to look at, but each iteration of a single transaction is now represented by a different Category/Value pair for the first two columns so that I can view the distribution across multiple attributes.


    The ONE piece that I am missing now, and have been unable to figure out so far, is how to create a column that gives a result representing the individual summary row's result as a percent of the overall total.  The challenge is that each row is itself already the result of a SUM to get to this point.  So essentially what i need is the SUM of the results of the Category/Value pair representing the row total, divided by the overall SUM of results for the Total/Total Category/Value pair.  And i cannot pre-calc this as part of the dataflow results because i have other transaction level fields included in this dataset to allow for use of Analyzer filters, so this calc has to be dynamic at the card level.


    I have an image in my head from the Domopalooza Beast Mode Boot Camp led by RScherbel showing an intersection of Beast Mode calcs, they can be horizontal (across a row), or they can be vertical (across the entire population), but I'm afraid I've painted myself into a corner because of the record-duplication structure and wanting to calc against only a portion of the results.


    I'm stuck.  Anyone have thoughts?  Ideas?  Probative questions?  Bueller...?



    **Say thank you by clicking the 'thumbs up'
    **Be sure to select the answer that represents the best solution and mark as "Accept as Solution"
  • kshah008
    kshah008 Contributor

    @ckatzman, please feel free to open a new thread with your follow-up for better exposure.

This discussion has been closed.