Redshift order by

My first try at using Redshift.  Trying to use ORDER BY and not getting the expected result.  The output table should be sorted by column "a".

 

Here are the steps:

 

1. Create tbl_data

SELECT 3 as "a", 30 as "b"
UNION ALL
SELECT 3 as "a", 30 as "b"
UNION ALL
SELECT 2 as "a", 20 as "b"
UNION ALL
SELECT 2 as "a", 20 as "b"
UNION ALL
SELECT 1 as "a", 10 as "b"
UNION ALL
SELECT 1 as "a", 10 as "b"

2. Order and output as pd_a1

SELECT *
FROM "tbl_data"
ORDER BY "a"

Here is the result:

https://drive.google.com/open?id=1s6L4LDPijFJY6Hjrp9F6Nsh9JBeo6yH3

 

Thanks for your attention.

Comments

  • I have the same problem, but it seems to only be a problem in the preview as the end result comes out correct.

  • For me the preview and the output are the same.

  • ST_Superman
    ST_Superman Domo Employee

    This has to be a bug.  I am having the same issue that @BruceP  is having with the order by clause.

     

    Have you submitted a ticket yet?  

  • I wrote a simple query to order by closedate and got some strange but consistent results. I tried to dig into the database details of how this is being sorted, like sortkey and distkey, but didn't really know what i was doing, and creating a temp table and applying the sortkey/distkey using custom sql didn't seem to make a difference. Any ideas? orderby date.PNG

  • Gionniz
    Gionniz Contributor

    I am having issues with the order_by too,
    is it a bug, eventually? any solution?

  • Recently I have had the same problem!

    My objective was to perform a cumulative sum of an ordered dataset. I tried to run Order By with different kinds of data type (date, string of date, row number column ecc.) but for all the tests the same result = no order.

     

    I choose to switch to the ETL dataflow and perform a window & ranking function.

    Maybe there is any window functions that coul help to force the ordering  in redshift (?)

  • ST_Superman
    ST_Superman Domo Employee

    Hi Kevin,

     

    I'm not sure why this works.  But, I was getting the same odd sorting that you were:redshift order by.png

     

     

    However, if I added a rank function to it, the sort worked:

    redshift rank.png

     

     

    ... or it did when I included one extra column???

    redshift rank final.png

     

     

    Very odd.  In conclusion... I would say to cautiously use the ORDER BY in redshift.  If you do, you will want to thoroughly check your results.  This has to be a bug.  I was not able to find any documentation from redshift stating that the order by clause was broken, but it seems to be in whichever version Domo is using for these dataflows.

  • user090374
    user090374 Domo Employee

    Hey everyone - here's the explanation provided by the support team at Domo - "The ORDER BY does apply for the purpose of the query that you use it in, but Redshift returns the data back to the preview in whatever order it wants. Unfortunately the ORDER BY only applies to the query itself and not the return data."

  • Starlord
    Starlord Domo Employee

    Hi everyone,

     

    I am a Tier II Technical Advisor on the Domo Support team, and I actually field this question about Redshift a lot.

     

    What you observe about the behavior with ORDER BY in Redshift is actually expected:

     

    When Redshift processes a query that contains an ORDER BY clause, it does honor the specified order while processing the logic of that query (i.e. RANK() functions and the like do work properly, and as expected). However, when it has finished running the query, it re-orders the data to optimize the actual storage of the data (Redshift stores data tables distributed across many nodes, and splits the data up according to it's own storage optimization methods).

     

    When Domo pulls up a preview of the data, we load it out of the table after it has been re-ordered and stored by Redshift.

     

    When working with Redshift, it's important to include an ORDER BY clause in every individual query that relies on that specific ordering of the data, and never to rely on the order persisting from a prior transform.

     

    In fact, this is best-practice when working with MySQL as well, in case somebody alters your dataflow without understanding that the order of a transform's output is important.

     

    EDIT: I'm also getting this information added to our Dataflow Troubleshooting Knowledge Base article here: https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/DataFlow_and_DataFusion_Troubleshooting_and_FAQ

     

    Thanks for bringing it up for discussion, and happy data processing!

     

    Starlord
    **Say “Thanks” by clicking the “heart” in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"