Is there a way to find out which number of partitions is optimal for my data? Any rules of thumb? Can I see it in the Spark UI?
I created a dataset by appending a small amount of rows (~ 30 - 100) very often to my data (currently 1000 times, but more to come). The WFs which should use this dataset are extremely slow (or hit spark.driver.maxResultSize property), which is solved, if I use the Manipulate Partitions processor. But as I must run these following Workflows also more than 1000 times, I would like to have the lowest performance as possible.
Currently, I am playing around with different numbers and look at the WF execution time. But this method is not really reliable, as many other people are working on the instance.