Used Defined Functions for transforming data

Hi, what would be the best approach to implement some UDFs on ONE DATA for data transformation?

I have for example this function:

# Longest common subsequence function (calculates similarity index for two words)
def lcs(a, b):
    minlength = min(len(a), len(b))
    table = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
    for i, ca in enumerate(a, 1):
        for j, cb in enumerate(b, 1):
            table[i][j] = (
                table[i - 1][j - 1] + 1 if ca == cb else
                max(table[i][j - 1], table[i - 1][j]))
    return table[-1][-1] / minlength

I as a OD user would like to use this function in a flexible way, i.e. I don’t want to copy, paste and adapt it in every python processor every time I need it.

Is there anybody with experience in OD Functions and could this feature help my use case?

1 Like

There a couple of options which work but all have some disadvantages. I assume you need to use it within a Python processor in a workflow. If your scenario would be to re-use functions within OD Functions the feature you are looking for is “Environments in Functions”.

For Python processors:

  • Create a OD Function and call this via API with the data.
    • [+] Works out of the box
    • [-] requires an API call with the data in a dedicated JSON structure
    • [-] might have issues for large data (since the data needs to be serialized and sent via HTTP request)
  • Ask the product about the “Environments for Python Processors” feature that was discussed but not yet planned (to my best knowledge)
    • [+] Would allow you to define a custom python library with all the functions and just use “from your_lib import lcs”
    • [-] Requires product development
  • Create a custom Python library (something that can be packaged as wheel and installed via “pip install”) and ask your friendly DevOp to install it into your pydata
    • [+] Would allow you to define any number of functions or classes and everything in the library and then use “from your_library import …”
    • [-] Every update needs a DevOp to install it
    • [-] I’m not sure if this works for every customer instance

In addition to these there is another option that might work out of the box but is not an official feature… Contact me if you want to know more about that, but the options above are definitely “cleaner”

1 Like

Thank you for the (very complete) answer @christoph.schober. From what I understand your first suggestion sounds the most promising. We still have to figure some things out with the DevOps on the client’s side but I’ll probably get back to you once we’re ready to test this. Here’s a taco for your answer :taco: