Text assessment

Text submitted to the API is assessed using an 'assessment pipeline' which consists of a particular combination of scoring models and other assessment functionality. For example, the scoring model could be one that returns scores on the CEFR scale, or the IELTS scale. Error detection and suggested corrections is another feature which depends on the configuration of the assessment pipeline. Different API clients can have differing assessment pipelines depending on the client's requirements.

Submitting text for assessment

Request

Method and URI:

PUT /VERSION/account/ACCOUNT_ID/text/ID

Request parameters:

Parameter	Required?	Format	Description
`version`	Yes		The desired API version.
`account_id`	Yes		Your API account ID.
`id`	Yes	max. 40 characters (alphanumeric or hyphen)	an ID which uniquely identifies the piece of text being submitted (for example, a UUID). A new ID must always be used when submitting a new piece of text. For example, if your system allows users to edit and resubmit their texts, a new ID must be used when resubmitting.

Request body JSON:

{"text": "Some text to be assessed", "author_id": "id of the author", "task_id" : "id of the task", "question_text": "The text of the question being asked"}

The attribute values are as follows:

Attribute name	Required?	Format	Description
`text`	Yes	Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return	The text to be assessed as a JSON string.
`author_id`	Yes	Maximum 40 characters (alphanumeric or hyphens)	The unique id of the original author.
`task_id`	Yes	Maximum 40 characters (alphanumeric or hyphens)	Should identify the task for which the text was submitted. For example, the client system may present a number of different writing tasks for users to try -- this attribute would indicate which task the text was written for.
`session_id`	Yes if sequence_id is specified	Maximum 40 characters (alphanumeric or hyphens)	Should identify the user session for which the text was submitted. For example, the end user may have several attempts at a particular task. This id should distinguish each of their sessions/attempts.
`sequence_id`	No (but required if sequence_count is specified)	Maximum 4 digits (0 - 9999)	Should identify the ordering of the question within the task, if the task represents a logical grouping of questions. This should be in the form of integers which will be sortable e.g. 0, 1, 2, 3 etc. Unique combinations of `task_id` & `sequence_id` indicate a set of grouped questions. For questions that are not logically grouped, they should be submitted with separate, unique `task_id`s.
`sequence_count`	Yes if sequence_id is specified	Integer 1 - 1000 inclusive	Must specify the number of related answers being submitted as a group (i.e. with the same `task_id` and differing `sequence_id` values).
`question_text`	Yes	Maximum 10,000 UTF-8 characters, excluding most control characters except horizontal tab, line feed and carriage return	The text of the question being asked as a JSON string.
`test`	No	`1`	If present, this must be the value `1` and indicates that the submission is a test (dummy) submission rather than a live submission from a learner.

Response

Successful submission

HTTP status code: 200

Response body JSON:

{"type": "success", "code": 200}

Failed submission

Submission can fail for a number of reasons. The general response format is as follows:

HTTP status code: 400

Example response body JSON:

{"type": "error", "code": 400, "message": "id already exists"}

type is always "error". The message attribute value can vary depending on the specific error as shown by the examples below.

Error	Code	Example message	Additional attributes
Submissions cannot be made using the same ID more than once unless the remaining parameters are identical. This allows for retry in the case of network errors, but disallows reusing the same ID for different submissions.	`400`	"id already exists"
Length limits are exceeded for any parameters.	`400`	"text too long"
A required attribute is missing or empty	`400`	"text missing"
A `sequence_id` value has been specified, but no `sequence_count` has been specified	`400`	"sequence_id supplied but no sequence_count supplied"
Disallowed characters are present	400	"text contains invalid characters"	Name `invalid_characters` with a value which is a JSON object, with names of the attributes containing invalid characters and values which are an array consisting of the string containing invalid characters, followed by one or more arrays containing the character indices of invalid characters. For example: `"invalid_characters": {"text": ["Hel\x0Clo th\u001F\uFFFEere", [3, 4], [9, 10], [10, 11]]}`

Some errors can occur before the validation of each attribute is performed:

Error	Code	Message	Additional attributes
The request body is not valid JSON	400	`invalid_json`
The request body contains byte sequences which are not valid UTF-8	400	`invalid_encoding`	`invalid_attributes`: an array of invalid attribute names, with invalid byte sequences encoded using `\xHH` notation, where `H` is a hex digit. For example: `"invalid_attributes": ["tex\xC2t", "task\xE3\x80_id"]` `invalid_values`: a JSON object where the attributes are the attributes with invalid values and the values are the invalid values, with invalid byte sequences encoded using `\xHH` notation, where `H` is a hex digit. For example: `"invalid_values": {"question_text": "Some \xC2text", "task\xE3\x80_id": "id\xC2three"}` with the second element showing the case where both an attribute name and its value happen to contain invalid UTF-8 (the attribute name would also appear in the `invalid_attributes` array).

Retrieving text assessment results

Request

GET /VERSION/account/ACCOUNT_ID/text/ID/results

Parameters:

Parameter	Required?	Description
`version`	Yes	The desired API version.
`account_id`	Yes	Your API account ID.
`id`	Yes	the unique ID specified in the original submission using the `PUT /account/1234/text/abc123` API call (`abc123` in this example).

Response

Results successfully retrieved

HTTP status code: 200

Example response body JSON:

{
 "type": "success", "code": 200, "overall_score": 7.3,
 "score_dimensions": {"prompt_relevance": 3.0},
 "sentence_scores": [[0, 5, -0.23], [6, 42, 0.56]],
 "suspect_tokens": [[0, 5], [40, 42]],
 "textual_errors": [[0, 5, "Greetings", "S"], [32, 35, "the", "MD+"]],
 "text_stats": {"r1": 0.333333, "r2": 0.103448, "r3": 0.0, "lcs": 7.0, "feature_count": 344.0, "word_count": 36.0}
}

Some of the attributes can be absent or empty, depending on the assessment pipeline in use. This is indicated in the attribute's description in the following table.

Attribute name	Format	Description
`type`	always "success"
`code`	always 200
`overall_score`	Floating-point number	The overall score for the piece of text. The range varies depending on the scoring model being used, for example, the default CEFR-based scale is 0.0 to 13.0; the IELTS scale is 0.0 to 9.0. See Scoring Scales for further details.
`score_dimensions`	JSON object	This attribute may not be present, depending on the assessment pipeline being used. If present, the only possible attribute is currently `prompt_relevance` (a number between 0.0 and 5.0 indicating how well the answer text relates to the question text, where 0.0 is the lowest relevance and 5.0 is the highest).
`sentence_scores`	Array	A score for each sentence within the piece of text. The array may be empty. If not empty, it contains further arrays for each sentence in which the 3 elements are: the integer index of the sentence start, the integer index of the sentence end and a floating-point score between -1.0 and 1.0
`suspect_tokens`	Array	Tokens (generally words) which have been identified as possibly incorrect/sub-optimal but for which the system has no suggested correction. The array may be empty. If not empty, it contains an array for each suspect token in which the 2 elements are the integer index of the start of the token and the integer index of the end of the token
`textual_errors`	Array	Errors identified within the piece of text for which the system can suggest a correction. The array may be empty. If not empty, it contains an array for each error in which the 4 elements are: the integer index of the start of the error, the integer index of the end of the error, the suggested correction and the error code. Refer to the appendix for a list of error codes.
`text_stats`	JSON object	This attribute may not be present, depending on the assessment pipeline being used. If present, please note that each of the attributes within may or may not be present. The attributes are: `r1` (floating-point number): The word overlap between the question and answer text, as a proportion of the answer text `r2` (floating-point number): The bigram overlap between the question and answer text, as a proportion of the answer text `r3` (floating-point number): The trigram overlap between the question and answer text, as a proportion of the answer text `lcs` (integer): The longest common subsequence shared by the question and answer `feature_count` (integer): A count of the features found in the answer `word_count` (integer): A count of the words found in the answer

Results not retrieved

In addition to the general possible responses outlined earlier in this document, there are a few specific reasons why results may not be retrieved.

Reason	HTTP status code	JSON response
Results are not yet ready. Wait at least 1 second and try again. See also Waiting for results below.	`200`	`{"type": "results_not_ready", "estimated_seconds_to_completion": 5.7, "code": 200}`
There was insufficient English text in the answer to assign a score	`200`	`{"type": "failure", "message": "insufficient_english_text", "code": 200}`
A sentence in the answer was so long that assessment was unable to be completed	`200`	`{"type": "failure", "message": "sentence_too_long", "code": 200}`
A token (word) in the answer was so long that assessment was unable to be completed	`200`	`{"type": "failure", "message": "token_too_long", "code": 200}`
An unspecified error meant assessment of the answer was unable to be completed	`200`	`{"type": "failure", "message": "unspecified_error", "code": 200}`
No submission found with the specified id	`404`	`{"type": "error", "code": 404, "message":"id not found"}`

Waiting for results

The system generally takes a few seconds to assess a piece of text. If results for a piece of text are not available when this API endpoint is called, the anticipated time remaining until the results will be available is returned in the estimated_seconds_to_completion response attribute. A client which wants to receive results as soon as possible (for example, because it needs to return results to its users as quickly as possible) should not poll in a tight loop, but must wait at least 1 second before requesting results again. A client which does not need results as quickly as possible can of course wait an arbitrary amount of time before requesting results again. In either case, a more sophisticated approach might take into account the estimated seconds to completion, instead of polling at a fixed time interval. However, note that the estimated seconds to completion is only a guide. Assessment of a particular piece of text may be faster or slower than expected, depending on its characteristics. If it is slower than expected, the estimated seconds to completion could reach 0 and remain there until the text has completed assessment.

Deleting submissions by an author

Text submissions and their results can be deleted by specifying the author whose submissions should be deleted. If no submissions are found when searching using the specified author ID, this API call will not return an error, but will instead say that 0 submissions were deleted.

If a deletion API call for the same author ID is made multiple times the same result will be returned, assuming no submissions with the author ID are made in between. For example, if a deletion request is made and reports that 2 submissions were deleted and the same deletion request is made again, it will still respond that 2 submissions were deleted, along with the timestamp that the original deletion request was processed.

Request

DELETE /VERSION/account/ACCOUNT_ID/author/ID

Parameters:

Parameter	Required?	Format	Description
`version`	Yes		The desired API version.
`account_id`	Yes		Your API account ID.
`id`	Yes	Maximum 40 characters (alphanumeric or hyphens)	the ID of the author whose data is to be deleted (the same author ID as specified by `author_id` when uploading a submission).

Response

Success

HTTP status code: 200

Example response body JSON:

{
  "type":"success","code":200,"submissions_deleted":2,"timestamp":"2018-09-11T11:15:14Z"
}

Attribute name	Format	Description
`type`	always "success"
`code`	always 200
`submissions_deleted`	Integer	The number of submissions deleted. An integer >= 0. It will be 0 if no submissions were found containing the specified author ID.
`timestamp`	UTC timestamp string in ISO-8601 format	The time when the deletion request was processed.

Failure

The request can fail if the author ID format is invalid.

HTTP status code: 400

Example response body JSON:

{"type": "error", "code": 400, "message": "id too long"}

type is always "error". The message attribute value can vary depending on the specific error as shown by the examples below.

Error	Code	Example message
Length limit is exceeded for author ID.	`400`	"id too long"
Invalid format for author ID.	`400`	"id must only contain alphanumerics and hyphens"

Text Assessment

Text assessment

Submitting text for assessment

Request

Response

Successful submission

Failed submission

Retrieving text assessment results

Request

Response

Results successfully retrieved

Results not retrieved

Waiting for results

Deleting submissions by an author

Request

Response

Success

Failure

results matching ""

No results matching ""