Building datasets

We offer help to build datasets. Since 2022/23 we have had a focus on building datasets optimized to utilise natural language models for specific domains or applications. This activity builds on previous work on datasets for various analytics applications.

Services offered

Data Preparation

We combine algorithms and manual processes to clean, validate and standardize data sets, transforming it to a reliable format for targeted applications (natural language models, analytics systems etc). We implement data normalization techniques to ensure consistency accross data sets. Data issues such as duplicaties, missing values, outliers etc are address, enhancing the integrity and usability of the data.

New Datasets

We have a range of tools, building on mostly open source solutions, to get new data. In many cases large datasets are available on the web. Through custom crawlers and bots we can efficiently collect data sets in the selected verticals and data sites. Our application is not general search, but rather collecting domain-specific data for a well defined purpose (we now work on utilising natural language models for a specifc industry vertical, previously we did work on collecting data for analytics systems).

Custom Solutions and Consulting

We offer bespoke data acqusition and integration solutions tailored to each customer’s business challenges and objectives. We both work in early phases in an advisory role, including performing feasibility studies for planned new projects, and participate in the implementation of larger projects through services for data set building.