Recently, I needed to categorise pages on a website according to service offerings and compile a list of keywords for each page. Luckily, I already had a list of the pages in Tape, so I just had to add the page text and set up some automation.
I opted to use AI (specifically GPT) for the task of extracting keywords and assigning categories. The categories were relatively straightforward - a choice of six. I provided the webpage text and the six options, then instructed the AI to select the relevant categories (which could be multiple) and input the results into a multi-select field.
However, handling the categories was a bit tricky. I wasnāt sure what the AI would come back with, and unfortunately, you canāt create multi-select field options on the fly via automation. So, I needed a different approach. I came up with the idea of creating a separate app to store all the Keywords and then establishing a two-way relationship with the relevant web pages.
Next, I wanted to guide the AI to use existing Keywords whenever possible. I didnāt want redundant entries like āTape APIā and āAPIā. I simply wanted āAPIā. To achieve this, I searched within the Keywords app with no filters and used the following code to extract the name and format it appropriately for sending with the prompt:
kw = jsonata('title').evaluate(record_collection_keywords)
var_currentkeywords = JSON.stringify(kw);
Iāll admit, Iām still fine-tuning the prompt for maximum benefit, but itās proving to be quite useful.
From there, itās just a matter of sending the āCurrent Keywordsā, webpage text, and prompt to OpenAI via their API and waiting for the results. When the response comes back, I parse out the new Keywords and categories:
const { category, keywords } = json_response;
Now, hereās where things could be a bit more streamlined, but for now, this method has been effective and I havenāt had the opportunity to explore improvements yet. I input the categories into a text field and then run a series of six conditionals. If a keyword exists in the text field, then the relevant option is added to the multi-select field.
Moving on to the new Keywords, I wanted to ensure I had clean and correct collections to work with. So, I cleared the previous collections:
Then, itās a matter of looping through each new Keyword and searching for it in the existing Keywords. If itās found, I establish a two-way relationship. If itās not, I create it and then establish the relationship before continuing with the loop.
Now, I have a comprehensive list of keywords and their relation to the web pages, along with the number of pages theyāre associated with. In my opinion, this approach offers more flexibility for future processing and analysis compared to using a multi-select field.
One more thing worth mentioning is that I ended up instructing the AI to provide a maximum of the 10 most relevant Keywords per page. This seemed to work better for my purposes than giving it free rein and having it return 20 per page.
I have made a quick video which may explain the process better: