Open-Ended-Score - AI-based evaluation of open-ended questions

Estimated reading: 6 minutes

This article explains the Open-Ended Score, that evaluates open text answers from respondents based on comprehensive quality criteria. Various algorithms and a leading AI language model are used for this purpose.

How the Open-Ended Score works

First, all answers to each question are classified into one of our quality categories.

Depending on the category, a score between 0 and 100 is then assigned.
The individual scores for each open-ended response are then used to calculate an overall Open-Ended-Score for each respondent.

The use of GPT-4

We use OpenAI's GPT-4 model as the underlying technology to perform most of our quality checks. This model is currently considered to be one of the most advanced Large Language Models (LLM). It enables us to perform extremely sophisticated categorization of responses. The GPT-4 model therefore gives us the power we need to assess the quality of open-ended responses in various aspects.

To ensure highest data protection standards, GPT-4 has been implemented in the ReDem® OES as follows.

Open ended responses are sent to OpenAI individually with a fully anonymized ID. This means that only individual responses per API query are visible to OpenAI, but never all responses of a survey.
ReDem® acts as the sole user vis-à-vis OpenAI. OpenAI is at no time aware of the sources of the imported data.
The data sent by ReDem® via API is only stored by OpenAI for a period of 30 days. After that, they are completely and irreversibly deleted by OpenAI. The data will not be used by OpenAI for training AI models at any time.
Finally, a "Data Processing Agreement" was concluded between ReDem® and OpenAI based on the EU standard contractual clauses (SCC), which regulate the GDPR-compliant transfer of data to third countries. This is relevant, for example, when personal data is located in open ended responses.

The ReDem® OES quality categories

We classify each response using one of our quality categories to better understand the respondent's score. Our current categories cover all relevant quality aspects for open-ended responses in surveys. However, we are continuously working on developing them further and adding new criteria as needed.

The quality categories are displayed in all data views.

Context check

Detects answers that do not fit the topic or question. More specifically, the answer context is checked against the keywords and the question.
All answers that do not correspond to the expected context receive the category »Wrong Topic« and an OES of 30.

Context checking can be enabled and disabled.
Important: You should only activate this option if your questions are meaningful enough or contain several relevant keywords.
- When providing keywords, please ensure that the contextual scope of the keywords is sufficiently broad to avoid misinterpretations that could lead to false positives.
Please note that this feature can only be enabled or disabled for all open-ended questions.

Enter several meaningful keywords for context checking.
More precise wording of keywords improves the context recognition of our AI.

Nonsense Check

Checking nonsense answers makes it possible to recognize gibberish, numbers and other meaningless statements.
All of these responses are labeled with the quality category »Nonsense« and assigned a score of 10.

Language check

By specifying expected languages, it is possible to check whether the answers were given in the correct language.
If no languages are selected, the language check is not activated.

If a non-expected language is detected in an answer, the category »Wrong Language« is assigned and this answer receives 50% of the original score.

Please note that the question and keywords should be formulated in one of the allowed languages.
Questions without linguistic information (e.g. on brand awareness) are unsuitable for the language check.

Duplicate detection

The optional duplicate check enables the identification of fraudulent responses. Both full duplicates and partial duplicates (answers that partially match) are detected.

The check includes the identification of answers that are repeated several times for the same question.
Such responses are categorized as »Duplicate Respondent« and assigned a score of 0.

There is also a check for duplicates across multiple questions.
The corresponding responses are also classified as »Duplicate Respondent« .

Our duplicate check also includes checking whether a respondent's answers are repeated or partially repeated in several questions. In the case of such answer behavior, the quality category »Duplicate Answer« is assigned and a score of 10 is given.

If a response can be considered both a »Duplicate Respondent« and a »Duplicate Answer«, the »Duplicate Respondent« category takes precedence.

Copy and paste check

If an answer is copied and pasted into the text field inside a survey, the system automatically detects this behavior.
Answers that are detected as copy-and-paste are assigned the quality category »Copy & Paste Answer« and receive an OES of 0.
Please note that this function is only available if ReDem® is linked to your survey tool.

Detection of "Fake" responses

In addition, a check is made to determine whether the structure of a response has a plausible pattern.
This enables the detection of answers that are thematically relevant but come from external sources such as Wikipedia.
Responses identified as a »Fake Answer« receive an OES of 0.

Detection of profanity

Both swear words and offensive answers are recognized.
For such responses, the category »Bad Language« is assigned with a score of 10.

Detection of generic answers

Generic statements such as »good«, »ok«, »anything«, »yes« and similar are classified as »Generic Answer« .
These responses are scored with an OES of 50.

Detection of answers without information

Answers lacking information content such as »no idea«, »nothing«, »no comment«, »I don't know«. are considered as »No Information« .
These responses are scored with an OES of 60.

Valid answers

Valid responses basically include all responses that do not fall into one of the other quality categories.
In addition, the level of detail of the answers is evaluated. Answers that fall into the »Valid Answer« quality category receive an OES between 70 and 100, depending on the level of detail.

Supported languages

The ReDem® Open-Ended-Score supports over 100 different languages such as, English, German, French, Spanish, Chinese, Japanese, Swedish and many more.

Example

Tips for use

To ensure that optimal quality is provided by the ReDem® Open-Ended-Score, we recommend selecting at least two open-ended questions in your survey. These questions should be answered by as many respondents as possible.
It is recommended to use specifically worded questions that cannot be answered with »Yes« or »No«
Since the OES is currently our most powerful quality indicator, we recommend including open-ended questions in as many questionnaires as possible.

If you have any further questions about the open-ended score, please contact us at business@redem.io