We all run into the situation where we have two tables with hundreds of million of rows and we drop them in the dataflow and know that our join transforms will never match up. Could we get the most recent 10k rows from the input datasets rather than something seemingly random? That would really boost our ability to validate that SQL as we go along.