Text assessment

Text submitted to the API is assessed using an 'assessment pipeline' which consists of a particular combination of scoring models and other assessment functionality. For example, the scoring model could be one that returns scores on the CEFR scale, or the IELTS scale. Error detection and suggested corrections is another feature which depends on the configuration of the assessment pipeline. Different API clients can have differing assessment pipelines depending on the client's requirements.

Submitting text for assessment

Request

Method and URI:

PUT /VERSION/account/ACCOUNT_ID/text/ID

Request parameters:

Parameter Required? Format Description
version Yes The desired API version.
account_id Yes Your API account ID.
id Yes max. 40 characters (alphanumeric or hyphen) an ID which uniquely identifies the piece of text being submitted (for example, a UUID). A new ID must always be used when submitting a new piece of text. For example, if your system allows users to edit and resubmit their texts, a new ID must be used when resubmitting.

Request body JSON:

{"text": "Some text to be assessed", "author_id": "id of the author", "task_id" : "id of the task", "question_text": "The text of the question being asked"}

The attribute values are as follows:

Attribute name Required? Format Description
text Yes Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return The text to be assessed as a JSON string.
author_id Yes Maximum 40 characters (alphanumeric or hyphens) The unique id of the original author.
task_id Yes Maximum 40 characters (alphanumeric or hyphens) Should identify the task for which the text was submitted. For example, the client system may present a number of different writing tasks for users to try -- this attribute would indicate which task the text was written for.
session_id Yes if sequence_id is specified Maximum 40 characters (alphanumeric or hyphens) Should identify the user session for which the text was submitted. For example, the end user may have several attempts at a particular task. This id should distinguish each of their sessions/attempts.
sequence_id No (but required if sequence_count is specified) Maximum 4 digits (0 - 9999) Should identify the ordering of the question within the task, if the task represents a logical grouping of questions. This should be in the form of integers which will be sortable e.g. 0, 1, 2, 3 etc. Unique combinations of task_id & sequence_id indicate a set of grouped questions. For questions that are not logically grouped, they should be submitted with separate, unique task_ids.
sequence_count Yes if sequence_id is specified Integer 1 - 1000 inclusive Must specify the number of related answers being submitted as a group (i.e. with the same task_id and differing sequence_id values).
question_text Yes Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return The text of the question being asked as a JSON string.
test No 1 If present, this must be the value 1 and indicates that the submission is a test (dummy) submission rather than a live submission from a learner.

Response

Successful submission

HTTP status code: 200

Response body JSON:

{"type": "success", "code": 200}

Failed submission

Submission can fail for a number of reasons. The general response format is as follows:

HTTP status code: 400

Example response body JSON:

{"type": "error", "code": 400, "message": "id already exists"}

type is always "error". The message attribute value can vary depending on the specific error as shown by the examples below.

Error Code Example message Additional attributes
Submissions cannot be made using the same ID more than once unless the remaining parameters are identical. This allows for retry in the case of network errors, but disallows reusing the same ID for different submissions. 400 "id already exists"
Length limits are exceeded for any parameters. 400 "text too long"
A required attribute is missing or empty 400 "text missing"
A sequence_id value has been specified, but no sequence_count has been specified 400 "sequence_id supplied but no sequence_count supplied"
Disallowed characters are present 400 "text contains invalid characters" Name invalid_characters with a value which is a JSON object, with names of the attributes containing invalid characters and values which are an array consisting of the string containing invalid characters, followed by one or more arrays containing the character indices of invalid characters. For example: "invalid_characters": {"text": ["Hel\x0Clo th\u001F\uFFFEere", [3, 4], [9, 10], [10, 11]]}

Some errors can occur before the validation of each attribute is performed:

Error Code Message Additional attributes
The request body is not valid JSON 400 invalid_json
The request body contains byte sequences which are not valid UTF-8 400 invalid_encoding invalid_attributes: an array of invalid attribute names, with invalid byte sequences encoded using \xHH notation, where H is a hex digit. For example: "invalid_attributes": ["tex\xC2t", "task\xE3\x80_id"]
invalid_values: a JSON object where the attributes are the attributes with invalid values and the values are the invalid values, with invalid byte sequences encoded using \xHH notation, where H is a hex digit. For example: "invalid_values": {"question_text": "Some \xC2text", "task\xE3\x80_id": "id\xC2three"} with the second element showing the case where both an attribute name and its value happen to contain invalid UTF-8 (the attribute name would also appear in the invalid_attributes array).

Retrieving text assessment results

Request

GET /VERSION/account/ACCOUNT_ID/text/ID/results

Parameters:

Parameter Required? Description
version Yes The desired API version.
account_id Yes Your API account ID.
id Yes the unique ID specified in the original submission using the PUT /account/1234/text/abc123 API call (abc123 in this example).

Response

Results successfully retrieved

HTTP status code: 200

Example response body JSON:

{
 "type": "success", "code": 200, "overall_score": 7.3,
 "score_dimensions": {"prompt_relevance": 3.0},
 "sentence_scores": [[0, 5, -0.23], [6, 42, 0.56]],
 "suspect_tokens": [[0, 5], [40, 42]],
 "textual_errors": [[0, 5, "Greetings", "S"], [32, 35, "the", "MD+"]],
 "text_stats": {"r1": 0.333333, "r2": 0.103448, "r3": 0.0, "lcs": 7.0, "feature_count": 344.0, "word_count": 36.0}
}

Some of the attributes can be absent or empty, depending on the assessment pipeline in use. This is indicated in the attribute's description in the following table.

Attribute name Format Description
type always "success"
code always 200
overall_score Floating-point number The overall score for the piece of text. The range varies depending on the scoring model being used, for example, the default CEFR-based scale is 0.0 to 13.0; the IELTS scale is 0.0 to 9.0. See Scoring Scales for further details.
score_dimensions JSON object This attribute may not be present, depending on the assessment pipeline being used. If present, the only possible attribute is currently prompt_relevance (a number between 0.0 and 5.0 indicating how well the answer text relates to the question text, where 0.0 is the lowest relevance and 5.0 is the highest).
sentence_scores Array A score for each sentence within the piece of text. The array may be empty. If not empty, it contains further arrays for each sentence in which the 3 elements are: the integer index of the sentence start, the integer index of the sentence end and a floating-point score between -1.0 and 1.0
suspect_tokens Array Tokens (generally words) which have been identified as possibly incorrect/sub-optimal but for which the system has no suggested correction. The array may be empty. If not empty, it contains an array for each suspect token in which the 2 elements are the integer index of the start of the token and the integer index of the end of the token
textual_errors Array Errors identified within the piece of text for which the system can suggest a correction. The array may be empty. If not empty, it contains an array for each error in which the 4 elements are: the integer index of the start of the error, the integer index of the end of the error, the suggested correction and the error code. Refer to the appendix for a list of error codes.
text_stats JSON object This attribute may not be present, depending on the assessment pipeline being used. If present, please note that each of the attributes within may or may not be present. The attributes are:
r1 (floating-point number): The word overlap between the question and answer text, as a proportion of the answer text
r2 (floating-point number): The bigram overlap between the question and answer text, as a proportion of the answer text
r3 (floating-point number): The trigram overlap between the question and answer text, as a proportion of the answer text
lcs (integer): The longest common subsequence shared by the question and answer
feature_count (integer): A count of the features found in the answer
word_count (integer): A count of the words found in the answer

Results not retrieved

In addition to the general possible responses outlined earlier in this document, there are a few specific reasons why results may not be retrieved.

Reason HTTP status code JSON response
Results are not yet ready. Wait at least 1 second and try again. See also Waiting for results below. 200 {"type": "results_not_ready", "estimated_seconds_to_completion": 5.7, "code": 200}
There was insufficient English text in the answer to assign a score 200 {"type": "failure", "message": "insufficient_english_text", "code": 200}
A sentence in the answer was so long that assessment was unable to be completed 200 {"type": "failure", "message": "sentence_too_long", "code": 200}
A token (word) in the answer was so long that assessment was unable to be completed 200 {"type": "failure", "message": "token_too_long", "code": 200}
An unspecified error meant assessment of the answer was unable to be completed 200 {"type": "failure", "message": "unspecified_error", "code": 200}
No submission found with the specified id 404 {"type": "error", "code": 404, "message":"id not found"}
Waiting for results

The system generally takes a few seconds to assess a piece of text. If results for a piece of text are not available when this API endpoint is called, the anticipated time remaining until the results will be available is returned in the estimated_seconds_to_completion response attribute. A client which wants to receive results as soon as possible (for example, because it needs to return results to its users as quickly as possible) should not poll in a tight loop, but must wait at least 1 second before requesting results again. A client which does not need results as quickly as possible can of course wait an arbitrary amount of time before requesting results again. In either case, a more sophisticated approach might take into account the estimated seconds to completion, instead of polling at a fixed time interval. However, note that the estimated seconds to completion is only a guide. Assessment of a particular piece of text may be faster or slower than expected, depending on its characteristics. If it is slower than expected, the estimated seconds to completion could reach 0 and remain there until the text has completed assessment.

Deleting submissions by an author

Text submissions and their results can be deleted by specifying the author whose submissions should be deleted. If no submissions are found when searching using the specified author ID, this API call will not return an error, but will instead say that 0 submissions were deleted.

If a deletion API call for the same author ID is made multiple times the same result will be returned, assuming no submissions with the author ID are made in between. For example, if a deletion request is made and reports that 2 submissions were deleted and the same deletion request is made again, it will still respond that 2 submissions were deleted, along with the timestamp that the original deletion request was processed.

Request

DELETE /VERSION/account/ACCOUNT_ID/author/ID

Parameters:

Parameter Required? Format Description
version Yes The desired API version.
account_id Yes Your API account ID.
id Yes Maximum 40 characters (alphanumeric or hyphens) the ID of the author whose data is to be deleted (the same author ID as specified by author_id when uploading a submission).

Response

Success

HTTP status code: 200

Example response body JSON:

{
  "type":"success","code":200,"submissions_deleted":2,"timestamp":"2018-09-11T11:15:14Z"
}
Attribute name Format Description
type always "success"
code always 200
submissions_deleted Integer The number of submissions deleted. An integer >= 0. It will be 0 if no submissions were found containing the specified author ID.
timestamp UTC timestamp string in ISO-8601 format The time when the deletion request was processed.

Failure

The request can fail if the author ID format is invalid.

HTTP status code: 400

Example response body JSON:

{"type": "error", "code": 400, "message": "id too long"}

type is always "error". The message attribute value can vary depending on the specific error as shown by the examples below.

Error Code Example message
Length limit is exceeded for author ID. 400 "id too long"
Invalid format for author ID. 400 "id must only contain alphanumerics and hyphens"

results matching ""

    No results matching ""