Text Classification API

You define the model. We classify your text.

Build your model using key words. Call our REST API and we’ll classify your text, document or web page for you. Improve and fine-tune your models over time for better results.

Liet analyses your text based on your models, assigning a score for each model. Keywords are weighted for relevance, and synonyms and close matches used where appropriate. All you see is the final score for each model.

Simple and straight forwards.

Classify: Legal - Medical - Technology - Business - Covid - Academic - Government - Your Domain

Try it out arrow-button-down-1

Success Response

Error Response

Sample Models

Getting Started — Postman or API Key

Try out our test API right now. No account required. No sign up required.

API Key

A valid API key is needed to access all endpoints. The API key should be included as a Request header. The following test key can be used:

X-API-Key: JWQMG2WD0UBGVC99RTLX

If you don’t want to use our API from your own app just yet, you can get started by downloading and importing the Postman collection below. This collection contains calls to all endpoints and includes a working API Key. Adjust the requests, data and models to suit your requirements.

Download Postman Collection

Read our Swagger documentation for a full list of available endpoints, and for a detailed breakdown of Request schemas.

Host: https://lietpreprod.azurewebsites.net/

Swagger

But remember, Liet.io is still under development. It should not be used in production apps at this time. Sign up to our news letter below and we’ll notify you when it’s production ready.

Covid-19 Model

Auto Classify Request

Defining Requests, Models & Keywords

Full definitions of all endpoints and schemas are on the API’s swagger page. In most cases, the values should be self-explanatory. This is a brief description of the less obvious values and how they are used.

A Model is where you define a set of keywords. Multiple different models can be passed in each request and a score between 1 and 100 is then returned for each of them. The Covid-19 example model opposite shows that each keyword is made up of six components:

Word The word or phrase you want to match against. Phrases with multiple words are considered stronger matches and are weighted higher than single words.

Weighting The default weighting for a word is 1. The “Weighting” value allows you to set a weighting between 1 and 5. Setting a high weighting on important keywords allows you to boost the score for the model when that word or phrase is found.

For example, “Covid” and “Covid-19” are given a weighting of 5 in the sample opposite, as their presence in the text will ALWAYS signify that the text is to some degree about Covid. Keywords such as “mask” or “vaccine,” while indicative of a Covid related piece of text in the current climate, will not always be so, as they are also likely to appear in text related to other pandemics.

MustHave When you set “MustHave” to true for a keyword, you are saying that at least one of the keywords defined as MustHaves must be present in the text. If none are present, the model will not match to the text regardless of what other lesser keywords might be found.

CaseSensitive This should be used when you are certain that the only occurrences of the word are capitalized, or where lack of capitalization can lead to false matches. For example: references to “Trump” in the US Politics model should match successfully to Donald Trump where the match is case sensitive. If lowercase matches were allowed, false matches could occur as the word “trump” is also a verb and a noun with their own meanings.

Case sensitive matches are scored slightly higher than regular matches.

AllowPlurals Set to true by default, this will cause a match for keywords such as “mask” where “masks” is found. You should always allow plurals unless there is a domain specific reason not to.

AllowSynonyms Synonym matches are useful but can be unreliable. For example, applying synonyms to “mask” will correctly match “covering” if it finds it — which would catch instances of “face covering,” a less common word used for masks in the Covid context. But it would also match “disguise” and “concealment,” words that are incorrect for the covid model.

For this reasons, synonym matches are scored considerably lower than regular matches. However, don’t let this deter you from allowing synonyms, as their use can improve the overall quality of the matches despite the false positives.

The quickest way to test all these values is to run some of the sample requests in the Postman project mentioned earlier. Tweaking the keywords and options will allow you to see how the model score changes as keywords are added, removed or edited.

Auto Classification

It’s early days for Liet.

Our expectations are that it will be used primarily to classify industry specific data: legal documents for legal firms, medical resources for hospitals or academic departments, government documents for specific government departments and outputs.

If this proves true, users will be defining and maintaining their own models. This is an area we’ll be expanding over the coming weeks and months as features are added to Liet.

But, we could be wrong about these use cases.

During testing, we’ve been creating custom models and testing against web content from news, industry and government sources.

These models form part of the default models applied when requests are made to our Auto Classification endpoints.

Swagger

As testing continues, we’ll be expanding these models and branching out into more domain specific models. For example: legal and medical models. How far we expand on these depends very much on how Liet is used over time.

In particular, are users using our Auto Classification or are they creating their own custom models.