Data Partitioning when storing in ONE DATA

Dear community,

is there a way to store data in partitions in ONE DATA?

The case we are facing is that for large amounts of data partitioning is one of the most important techniques to increase performance. If we store data within ONE DATA we would like to define partitions to be able to load it faster in the end.
Is there a way to do that?

Thanks.

Sure.
Before saving, add a Manipulate Partitions processor in repartition mode (toggle in config) and select columns to be used for partitioning. Please note that there is no possibility of hierarchic partitioning atm. Moreover, the partitioning will not be reflected by a folder structure but only by file content distribution. If you are in need for explicitly folder-based partitioning, please reach out for our Delivery team with a feature request.

1 Like

I assume that your question concerns Parquet and ORC formats which @Flogge has already answered.

Just for completeness: It is also possible to partition tables in Postgres, but it’s currently not supported for tables created in ONE DATA and has only limited support for external tables (you can only read them but not create or update them from ONE DATA)