Text assessment
Text submitted to the API is assessed using an 'assessment pipeline' which consists of a particular combination of scoring models and other assessment functionality. For example, the scoring model could be one that returns scores on the CEFR scale, or the IELTS scale. Error detection and suggested corrections is another feature which depends on the configuration of the assessment pipeline. Different API clients can have differing assessment pipelines depending on the client's requirements.
Submitting text for assessment
Request
Method and URI:
PUT /VERSION/account/ACCOUNT_ID/text/ID
Request parameters:
Parameter | Required? | Format | Description |
---|---|---|---|
version |
Yes | The desired API version. | |
account_id |
Yes | Your API account ID. | |
id |
Yes | max. 40 characters (alphanumeric or hyphen) | an ID which uniquely identifies the piece of text being submitted (for example, a UUID). A new ID must always be used when submitting a new piece of text. For example, if your system allows users to edit and resubmit their texts, a new ID must be used when resubmitting. |
Request body JSON:
{"text": "Some text to be assessed", "author_id": "id of the author", "task_id" : "id of the task", "question_text": "The text of the question being asked"}
The attribute values are as follows:
Attribute name | Required? | Format | Description |
---|---|---|---|
text |
Yes | Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return | The text to be assessed as a JSON string. |
author_id |
Yes | Maximum 40 characters (alphanumeric or hyphens) | The unique id of the original author. |
task_id |
Yes | Maximum 40 characters (alphanumeric or hyphens) | Should identify the task for which the text was submitted. For example, the client system may present a number of different writing tasks for users to try -- this attribute would indicate which task the text was written for. |
session_id |
Yes if sequence_id is specified | Maximum 40 characters (alphanumeric or hyphens) | Should identify the user session for which the text was submitted. For example, the end user may have several attempts at a particular task. This id should distinguish each of their sessions/attempts. |
sequence_id |
No (but required if sequence_count is specified) | Maximum 4 digits (0 - 9999) | Should identify the ordering of the question within the task, if the task represents a logical grouping of questions. This should be in the form of integers which will be sortable e.g. 0, 1, 2, 3 etc. Unique combinations of task_id & sequence_id indicate a set of grouped questions. For questions that are not logically grouped, they should be submitted with separate, unique task_id s. |
sequence_count |
Yes if sequence_id is specified | Integer 1 - 1000 inclusive | Must specify the number of related answers being submitted as a group (i.e. with the same task_id and differing sequence_id values). |
question_text |
Yes | Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return | The text of the question being asked as a JSON string. |
test |
No | 1 |
If present, this must be the value 1 and indicates that the submission is a test (dummy) submission rather than a live submission from a learner. |
Response
Successful submission
HTTP status code: 200
Response body JSON:
{"type": "success", "code": 200}
Failed submission
Submission can fail for a number of reasons. The general response format is as follows:
HTTP status code: 400
Example response body JSON:
{"type": "error", "code": 400, "message": "id already exists"}
type
is always "error". The message
attribute value can vary depending on the specific error as shown by the examples below.
Error | Code | Example message | Additional attributes |
---|---|---|---|
Submissions cannot be made using the same ID more than once unless the remaining parameters are identical. This allows for retry in the case of network errors, but disallows reusing the same ID for different submissions. | 400 |
"id already exists" | |
Length limits are exceeded for any parameters. | 400 |
"text too long" | |
A required attribute is missing or empty | 400 |
"text missing" | |
A sequence_id value has been specified, but no sequence_count has been specified |
400 |
"sequence_id supplied but no sequence_count supplied" | |
Disallowed characters are present | 400 | "text contains invalid characters" | Name invalid_characters with a value which is a JSON object, with names of the attributes containing invalid characters and values which are an array consisting of the string containing invalid characters, followed by one or more arrays containing the character indices of invalid characters. For example: "invalid_characters": {"text": ["Hel\x0Clo th\u001F\uFFFEere", [3, 4], [9, 10], [10, 11]]} |
Some errors can occur before the validation of each attribute is performed:
Error | Code | Message | Additional attributes |
---|---|---|---|
The request body is not valid JSON | 400 | invalid_json |
|
The request body contains byte sequences which are not valid UTF-8 | 400 | invalid_encoding |
invalid_attributes : an array of invalid attribute names, with invalid byte sequences encoded using \xHH notation, where H is a hex digit. For example: "invalid_attributes": ["tex\xC2t", "task\xE3\x80_id"] invalid_values : a JSON object where the attributes are the attributes with invalid values and the values are the invalid values, with invalid byte sequences encoded using \xHH notation, where H is a hex digit. For example: "invalid_values": {"question_text": "Some \xC2text", "task\xE3\x80_id": "id\xC2three"} with the second element showing the case where both an attribute name and its value happen to contain invalid UTF-8 (the attribute name would also appear in the invalid_attributes array). |
Retrieving text assessment results
Request
GET /VERSION/account/ACCOUNT_ID/text/ID/results
Parameters:
Parameter | Required? | Description |
---|---|---|
version |
Yes | The desired API version. |
account_id |
Yes | Your API account ID. |
id |
Yes | the unique ID specified in the original submission using the PUT /account/1234/text/abc123 API call (abc123 in this example). |
Response
Results successfully retrieved
HTTP status code: 200
Example response body JSON:
{
"type": "success", "code": 200, "overall_score": 7.3,
"score_dimensions": {"prompt_relevance": 3.0},
"sentence_scores": [[0, 5, -0.23], [6, 42, 0.56]],
"suspect_tokens": [[0, 5], [40, 42]],
"textual_errors": [[0, 5, "Greetings", "S"], [32, 35, "the", "MD+"]],
"text_stats": {"r1": 0.333333, "r2": 0.103448, "r3": 0.0, "lcs": 7.0, "feature_count": 344.0, "word_count": 36.0}
}
Some of the attributes can be absent or empty, depending on the assessment pipeline in use. This is indicated in the attribute's description in the following table.
Attribute name | Format | Description |
---|---|---|
type |
always "success" | |
code |
always 200 | |
overall_score |
Floating-point number | The overall score for the piece of text. The range varies depending on the scoring model being used, for example, the default CEFR-based scale is 0.0 to 13.0; the IELTS scale is 0.0 to 9.0. See Scoring Scales for further details. |
score_dimensions |
JSON object | This attribute may not be present, depending on the assessment pipeline being used. If present, the only possible attribute is currently prompt_relevance (a number between 0.0 and 5.0 indicating how well the answer text relates to the question text, where 0.0 is the lowest relevance and 5.0 is the highest). |
sentence_scores |
Array | A score for each sentence within the piece of text. The array may be empty. If not empty, it contains further arrays for each sentence in which the 3 elements are: the integer index of the sentence start, the integer index of the sentence end and a floating-point score between -1.0 and 1.0 |
suspect_tokens |
Array | Tokens (generally words) which have been identified as possibly incorrect/sub-optimal but for which the system has no suggested correction. The array may be empty. If not empty, it contains an array for each suspect token in which the 2 elements are the integer index of the start of the token and the integer index of the end of the token |
textual_errors |
Array | Errors identified within the piece of text for which the system can suggest a correction. The array may be empty. If not empty, it contains an array for each error in which the 4 elements are: the integer index of the start of the error, the integer index of the end of the error, the suggested correction and the error code. Refer to the appendix for a list of error codes. |
text_stats |
JSON object | This attribute may not be present, depending on the assessment pipeline being used. If present, please note that each of the attributes within may or may not be present. The attributes are:r1 (floating-point number): The word overlap between the question and answer text, as a proportion of the answer textr2 (floating-point number): The bigram overlap between the question and answer text, as a proportion of the answer textr3 (floating-point number): The trigram overlap between the question and answer text, as a proportion of the answer textlcs (integer): The longest common subsequence shared by the question and answerfeature_count (integer): A count of the features found in the answerword_count (integer): A count of the words found in the answer |
Results not retrieved
In addition to the general possible responses outlined earlier in this document, there are a few specific reasons why results may not be retrieved.
Reason | HTTP status code | JSON response |
---|---|---|
Results are not yet ready. Wait at least 1 second and try again. See also Waiting for results below. | 200 |
{"type": "results_not_ready", "estimated_seconds_to_completion": 5.7, "code": 200} |
There was insufficient English text in the answer to assign a score | 200 |
{"type": "failure", "message": "insufficient_english_text", "code": 200} |
A sentence in the answer was so long that assessment was unable to be completed | 200 |
{"type": "failure", "message": "sentence_too_long", "code": 200} |
A token (word) in the answer was so long that assessment was unable to be completed | 200 |
{"type": "failure", "message": "token_too_long", "code": 200} |
An unspecified error meant assessment of the answer was unable to be completed | 200 |
{"type": "failure", "message": "unspecified_error", "code": 200} |
No submission found with the specified id | 404 |
{"type": "error", "code": 404, "message":"id not found"} |
Waiting for results
The system generally takes a few seconds to assess a piece of text. If results for a piece of text are not available when this API endpoint is called, the anticipated time remaining until the results will be available is returned in the estimated_seconds_to_completion
response attribute. A client which wants to receive results as soon as possible (for example, because it needs to return results to its users as quickly as possible) should not poll in a tight loop, but must wait at least 1 second before requesting results again. A client which does not need results as quickly as possible can of course wait an arbitrary amount of time before requesting results again. In either case, a more sophisticated approach might take into account the estimated seconds to completion, instead of polling at a fixed time interval. However, note that the estimated seconds to completion is only a guide. Assessment of a particular piece of text may be faster or slower than expected, depending on its characteristics. If it is slower than expected, the estimated seconds to completion could reach 0 and remain there until the text has completed assessment.
Deleting submissions by an author
Text submissions and their results can be deleted by specifying the author whose submissions should be deleted. If no submissions are found when searching using the specified author ID, this API call will not return an error, but will instead say that 0 submissions were deleted.
If a deletion API call for the same author ID is made multiple times the same result will be returned, assuming no submissions with the author ID are made in between. For example, if a deletion request is made and reports that 2 submissions were deleted and the same deletion request is made again, it will still respond that 2 submissions were deleted, along with the timestamp that the original deletion request was processed.
Request
DELETE /VERSION/account/ACCOUNT_ID/author/ID
Parameters:
Parameter | Required? | Format | Description |
---|---|---|---|
version |
Yes | The desired API version. | |
account_id |
Yes | Your API account ID. | |
id |
Yes | Maximum 40 characters (alphanumeric or hyphens) | the ID of the author whose data is to be deleted (the same author ID as specified by author_id when uploading a submission). |
Response
Success
HTTP status code: 200
Example response body JSON:
{
"type":"success","code":200,"submissions_deleted":2,"timestamp":"2018-09-11T11:15:14Z"
}
Attribute name | Format | Description |
---|---|---|
type |
always "success" | |
code |
always 200 | |
submissions_deleted |
Integer | The number of submissions deleted. An integer >= 0. It will be 0 if no submissions were found containing the specified author ID. |
timestamp |
UTC timestamp string in ISO-8601 format | The time when the deletion request was processed. |
Failure
The request can fail if the author ID format is invalid.
HTTP status code: 400
Example response body JSON:
{"type": "error", "code": 400, "message": "id too long"}
type
is always "error". The message
attribute value can vary depending on the specific error as shown by the examples below.
Error | Code | Example message |
---|---|---|
Length limit is exceeded for author ID. | 400 |
"id too long" |
Invalid format for author ID. | 400 |
"id must only contain alphanumerics and hyphens" |