NAV Navbar
shell

Introduction

Welcome to Identrics Machine learning services documentation. You can use our API to access Identrics API endpoints, which can provide access to various machine learning models and services endpoints.

We have language bindings in curl, Python, Java and JavaScript. You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

Contexts service

Provides information about application contexts loaded in current Trinity instance. In most cases application contexts hold information about machine learning models loaded on that instance. For example application context named "t_bg_Traditional_media_Bussiness_RAkELd-952" corresponds to: Topic categories multi-label classifier for Bulgarian traditional media business content.

Get application contexts

Returns the context metadata

curl -X POST http://10.12.20.16:8080/services/contexts/getApps

The above command returns JSON structured like this:

[
    {
        comment: "You can clone this view to generate custom views.",
        description: "Trinity customization for Classifier for multi-label tagger and sentiment analysis.",
        title: "Classifier service context",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "cat_bg_Traditional_media_80"
    },
    {
        comment: "You can clone this model to generate custom models",
        description: "Sentiment classifier for English traditional media business content",
        title: "Sentiment classifier for English traditional media business content",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "s_en_Traditional_media_Bussiness_SMO-707"
    },
    {
        comment: "You can clone this model to generate custom models",
        description: "Sentiment classifier for Bulgarian social media business content",
        title: "Sentiment classifier for Bulgarian social media busines content",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "s_bg_Social_media_Bussines_SMO-92"
    }
]

HTTP Request

http://10.12.20.16:8080/services/contexts/getApps

Response Properties

Property Description
comment The comment for the application context.
description Description for the application context.
title Title of the application context.
author Author of the application context.
collections The number of data collections asigned to the application contexts.
thumbnail Thumbnail.
name System name of the application context. This name is used in all other API calls to refer to the context in requests.

Get application context names

Returns the context names

curl -X POST http://10.12.20.16:8080/services/contexts/getAppNames

The above command returns JSON array:

[
    "t_bg_Traditional_media_Bussiness_RAkELd-952",
    "classifier",
    "cat_bg_Traditional_media_80",
    "s_en_Traditional_media_Bussiness_SMO-707",
    "s_bg_Social_media_Bussines_SMO-92",
    "t_bg_Traditional media_Automotive_SMO_0.899",
    "t_bg_Social media_Pharma_SMO_0.617",
    "t_bg_Traditional media_Automotive_SMO_0.927",
    "fibep_cat_en_Traditional_media_RAkELd-51",
    "s_en_Traditional_media_Bussiness_SMO-743",
    "t_bg_Social media_Pharma_SMO_0.901",
    "s_en_Traditional_media_Bussiness_SMO-796",
    "nace_bg_Traditional_media_937",
    "ame_en_Traditional_media_SMO_927",
    "fibep_sentiment_en_Traditional_media_DL4J-87",
    "s_bg_Traditional_media_Bussiness_SMO-81",
    "s_en_Traditional_media_Bussiness_SMO-762"
]

If the same command is applied to our NER service address, the response JSON would be:

[
    "en",
    "de",
    "id",
    "bg",
    "nl",
    "fr",
    "tr",
    "zh-cn",
    "es",
    "sv"
]

Response Properties

Property Description
message JSON array with the names of all the context within a given service

HTTP Request

http://10.12.20.16:8080/services/contexts/getAppNames/

Language Detection

This is a language detection service. It accepts text and returns result with detected language code and score.

Lang Detect

In the following snippet, the language is bulgarian

curl -X POST -d \
'json=
[
    "langdetect",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата  quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://10.50.30.10:8080/services/langdetect/langDetect

The above command returns message with language code sturctured like this:

Returns the language code for a given text.

HTTP POST Request

http://10.50.30.10:8080/services/langdetect/langDetect

Query Parameters

Parameter Description
context Default context name
text The text from which the language will be extracted.

Response Properties

Property Description
message The message contains language code for given text input

Stemming

In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word.

Stemm

In the following snippet, the language is english

curl -X POST -d \
'json=
[
    "stare-en",
    "It issued a statement in response to a speech by South Korea President Moon Jae-in on Thursday. Meanwhile, early on Friday North Korea test-fired two missiles into the sea off its eastern coast, the South Korean military said. It is the sixth such test in less than a month."
]' \
http://10.50.30.17:8080/services/stemmer/stem

The above command returns message with stemmed text:

Returns stemmed text for a given text.

HTTP POST Request

http://10.50.30.17:8080/services/stemming/stem

Query Parameters

Parameter Description
context context name, typically language code
text The input text string on which stemming algorithm will be applied

Response Properties

Property Description
message Stemmed version of the input text string

Ignorelist

This is an example of ignorelist for common words in English

and
or
is
are
last
let
you
yeah

In computing, stop words are words which are filtered out before or after processing of natural language data (text). Stop words are generally the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools avoid removing stop words to support phrase search/classification/vectorization.

Filter

In the following snippet, the language is english

curl -X POST -d \
'json=
[
    "stare-en",
    "It issued a statement in response to a speech by South Korea President Moon Jae-in on Thursday. Meanwhile, early on Friday North Korea test-fired two missiles into the sea off its eastern coast, the South Korean military said. It is the sixth such test in less than a month."
]' \
http://10.50.30.17:8080/services/ignorelist/filter

The above command returns message with language code sturctured like this:

Returns shorter version of a given text based on the ignorelist used by the given context.

HTTP POST Request

http://10.50.30.17:8080/services/ignorelist/filter

Query Parameters

Parameter Description
context context name, typically language code
text The input text string on which an ignorelist will be applied

Response Properties

Property Description
message Ignorelisted version of the input text string

Text Cleaner

The quick, easy, web based way to fix and clean up text when copying and pasting between web services. Uses langDetect in order to execute a pipeline constructed of multiple STAGES combined in a strict order. The pipeline can be differently configured depending on the task (special symbols are kept for Sentiment😊, text is stemmed to increase document similarity accuracy). The text-cleaner can be combined with replaceEntities and removeEntities.

Stages Description Table

Here is an example configuration for text-cleaner pipelines.

context.title=Trinity Text Cleaner
context.description=Text formatting pipelines
context.author=Nikola Velichkov
context.comment=poc1
context.thumbnail = media/logo.png

context.strictClean.pipeline = HTML,URLCLEAN,DUPLICATES,LOWERCASE,SPLIT,IGNORELIST,JOIN,SPLIT,SYNONYMS,JOIN,SPLIT,DICTIONARY,JOIN,REMOVE_PUNCTUATION,REMOVE_SYMBOLS,REMOVE_NUMBERS,STEMM,WHITESPACE
context.looseClean.pipeline = HTML,URLCLEAN,DUPLICATES,WHITESPACE
context.checkIRI.pipeline = URLCLEAN

SPLIT = split
SPLIT.mode = split
SPLIT.input = @xfer
SPLIT.output = @xfer

JOIN = split
JOIN.mode = join
JOIN.input = @xfer
JOIN.output = @xfer

STEMM = stemming-stage
STEMM.language = @language
STEMM.input = @xfer
STEMM.output = @xfer

HTML = html
HTML.input = @xfer
HTML.output = @xfer

IGNORELIST = ignorelist-stage
IGNORELIST.language = @language
IGNORELIST.input = @xfer
IGNORELIST.output = @xfer
...
Name Description
SYNONYMS Synonyms are normalized into their cannonical form.
DICTIONARY Rare words removal, only dictionary words are kept.
STEMM Stemming
HTML HTML removal
LOWERCASE Lower casing
REMOVE_PUNCTUATION Punctuation removal
REMOVE_SYMBOLS Special symbols removal
REMOVE_NUMBERS Number removal
DUPLICATES Duplicated sentences removal
URLCLEAN URL removal
QUICKCLEAN Non-alphanumeric removal
WHITESPACE Removes multiple spaces and strips/trims
SPLIT Tokenization
JOIN Detokenization
IGNORELIST Frequent/Rare/Custom words removal

Pipeline definitions of looseClean and strictClean are in the shell section here.

Clean

In the following curl http POST sample the language is Croatian

curl -X POST -d \
'json=
[
    "text-cleaner",
    "looseClean",
    "<div class=\"message\"><p>Zagrizi svoju omiljenu pile\u0107u poslasticu kako bi vikend bio jo\u0161 bolji! \uD83D\uDE0A</p>\r\n<p>\u010Cekanje u redu? Putovanje do restorana? Ma nema potrebe, dovoljno je samo par klikova i sti\u017Ee tvoja omiljena KFC hrana!\uD83E\uDD70</p>\r\n<p>Naru\u010Di preko https://dostava.kfc.hr/, putem Glova, Pauze ili Wolta te na aplikaciji KFC Hrvatska!</p></div>"
]' \
http://10.50.30.10:8080/services/text-cleaner/clean
curl -X POST -d \
'json=
[
    "text-cleaner",
    "strictClean",
    "<div class=\"message\"><p>Zagrizi svoju omiljenu pile\u0107u poslasticu kako bi vikend bio jo\u0161 bolji! \uD83D\uDE0A</p>\r\n<p>\u010Cekanje u redu? Putovanje do restorana? Ma nema potrebe, dovoljno je samo par klikova i sti\u017Ee tvoja omiljena KFC hrana!\uD83E\uDD70</p>\r\n<p>Naru\u010Di preko https://dostava.kfc.hr/, putem Glova, Pauze ili Wolta te na aplikaciji KFC Hrvatska!</p></div>"
]' \
http://10.50.30.10:8080/services/text-cleaner/clean

HTTP POST Request

http://10.50.30.10:8080/services/text-cleaner/clean

Query Parameters

Parameter Description
context Default context name
pipeline Name of pipeline context.looseClean.pipeline
text The input text string executed on the pipeline STEMM.input = @xfer

Response Properties

Property Description
message Cleaner version of the input text string

Classification

General purpose of the classifier is to assign (categorical) class labels to particular document. The set of classes can be a list of categories, taxonomy hierarchy, sentiment scale or just boolean true / false. In most cases the set of classes are defined by the client. Identrics classifier supports three different classification tasks, depending on what kind of results representation is expected in prediction output.

Binary classification task

Binary classification is the problem of classifying content into one of two groups of categories (classes). Typical binary classification scenarios can be applied in filtering undesired messages in mailbox “spam versus ham” or to filter out which documents in data set are, or are not relevant to specific topic of interest.

Multi-class classification task

Multi-class classification is the problem of classifying content into one of three or more groups of categories (classes). Typical multi-class classification scenario is sentiment analysis, using three or more fixed classes for positive, neutral and negative mentions of entities in document text or general sentiment for the whole document.

Multi-label classification task

Multi-label classification is the problem of classifying content into set of (one or many) labels. Multi-label classification is a generalization of multi-class classification, which is the single-label problem. In the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Typical multi-label classification scenario can be applied as document tagger with set of categories from client taxonomy or just desired list of tags.

Classify text

In the following curl http POST sample the model name is "t_bg_Traditional media_Automotive_SMO_0.927"

curl -X POST -d \
'json=
[
    "t_bg_Traditional media_Automotive_SMO_0.927",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://10.12.20.16:8080/services/classifier/classify

The above command returns JSON structured like this:

[
    {
        "annotationId" : null,
        "annotationSource" : "t_bg_Traditional media_Automotive_SMO_0.927",
        "documentId" : null,
        "annotationClass" : "http://identrics.net/resource/category/PROSAL",
        "annotationType" : "taxonomy"
    }
]

In the following curl http POST sample the model name is "fibep_sentiment_en_Traditional_media_DL4J"

curl -X POST -d \
'json=
[
    "fibep_sentiment_en_Traditional_media_DL4J-87",
    "Saudi Arabia has said it will carry out urgent reprisals as it accused Iran of being behind a late-night cruise missile attack by Houthi rebel fighters on a Saudi international airport that injured 26 people. The Saudi foreign ministry said the Command of Joint Forces of the Coalition promised it \"will take urgent and timely measures to deter these Iranian-backed terrorist Houthi militias\". The attack on Abha airport was condemned across the Middle East and by the US defence department. The Saudi-backed Yemeni government, which has been fighting a four-year civil war against the Houthi rebels, claimed the missile directed at the airport had been supplied by Iran, even claiming Iranian experts were present at the missiles launch. The latest major Trump resignations and firings Read more Iran strongly denies Saudi claims of aiding the Houthi movement. The Houthi rebels insist they have a right to defend themselves from a Saudi directed blockade, and reported an initial Saudi reprisal that hit densely populated areas in the north of the country. Diplomats will fear that the conflict in Yemen is spilling over into the dispute between Washington and Tehran, particularly if the US backs claims that Iran is directing the increasingly sophisticated Houthi attacks deep into Saudi territory. A Houthi military spokesman promised the group would target every airport in Saudi Arabia and that the coming days would reveal \"big surprises\". No fatalities were reported in the airport attack, which hit the arrivals hall, but the number of civilians wounded was the largest in any Houthi attack inside Saudi Arabia. The Houthis al-Masirah satellite news channel said the missile hit its intended target, Abha airport, near the Yemen border, disrupting flights. The rebels have also carried out drone strikes on Saudi oil installations and may have been responsible for recent attacks on oil tankers off the coast of the United Arab Emirates. A UAE-led investigation into the shipping attacks was unable to identify the culprits, but said a state-supported actor was involved. The precise extent to which Iran is providing military assistance to the Houthi movement is a matter of dispute, but UN reports suggest it has provided weaponry. Iran operates through surrogates, but has looked as if it was seeking ways to reduce tensions with the US. The Japanese prime minister, Shinzo Abe, arrived in Tehran on Wednesday, carrying what Iran expects is a message on behalf of Donald Trump that sets out US conditions for direct talks. Iran is threatening to pull out of the 2015 nuclear deal unless the US relaxes economic sanctions that are crippling the the countrys economy. Trump pulled the US out of the deal last year. A spokesman for the Saudi-led coalition, Turki al-Maliki, was quoted by the state-run Al Ekhbariya news channel as saying three women and two children were among those hurt, and that eight people were taken to hospital while 18 sustained minor injuries. A Houthi spokesman, Mohamed Abdel Salam, said the attack was in response to Saudi Arabias \"continued aggression and blockade on Yemen\". Earlier in the week, he said attacks on Saudi airports were \"the best way to break the blockade\" of the airport in Yemens capital, Sanaa, which the rebels overran in late 2014. Tens of thousands of civilians have been killed in the conflict since, relief agencies say."
]' \
http://10.12.20.16:8080/services/classifier/classify

The above command returns JSON structured like this:

[
    {
        "annotationId" : null,
        "annotationSource" : "fibep_sentiment_en_Traditional_media_DL4J",
        "documentId" : null,
        "annotationClass" : "-1",
        "annotationType" : "sentiment"
    }
]

Returns classes prediction for given machine learning model and text input

HTTP POST Request

http://10.12.20.16:8080/services/classifier/classify

Query Parameters

Parameter Description
model name The name of machine learning model used for class prediction.
text The text to be classified.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.

Named Entity Recognition

Named Entity Recognition (NER) is information extraction task to locate and classify entities in text. Tipical classes are Person, Organization, Location, Time, Money, Percentages etc. Entity classes are not limited just to those mentioned. Principally they can be customized depending on domain of knowledge and specific needs for information to be discovered. The types of extracted entities depend mostly of training dataset. If manually annotated corpus is designed to extract market products, employee positions or chemical compounds that types should be explicitly marked. In general, any kind of entity types can be specified for extraction if they are manually annotated in training dataset.

Recognize Entities

In the following curl http POST sample the language is bulgarian

curl -X POST -d \
'json=
[
    "bg",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://10.50.30.12:8080/services/ner/getAnnotations

The above command returns JSON response structured like this:

[
    {
        annotationURI: null,
        index: 12,
        annotationSource: null,
        documentId: null,
        word: "Европа",
        annotationClass: "LOCATION",
        endOffset: 81,
        startOffset: 75,
        annotationId: null,
        annotationType: "ner"
    }
]

Entity types may vary for the different NER models

[
    "PERSON",
    "LOCATION",
    "ORGANIZATION",
    "PRODUCT",
    "MONEY",
    "DATE",
    "NUMBER",
    "MISC",
    "GPE"
]

Returns entities and their classes extracted from text.

HTTP POST Request

http://10.50.30.12:8080/services/ner/getAnnotations

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be extracted.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType ner The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Supported languages

English, German, Indonesian/Bahasa, Bulgarian, Dutch, French, Turkish, Simplified Chinese, Spanish and Swedish. Language codes correspond to context names

Replace Entities

In the following curl http POST sample the language is english

curl -X POST -d \
'json=
[
    "en",
    "String text = BUCHAREST (Romania), December 28 (SeeNews) - Moldovas designated prime-minister A. Sturza said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements on the assets they currently own, news agency Moldpress quoted Sturza as saying after a discussion with parliament president Andrian Candu. Sturza also said he will ask the Permanent Bureau - a body of nine members representing the main political formations in Moldovas parliament - to set the date for a vote of confidence. On Thursday Sturza said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. Moldovan president Nicolae Timofti nominated Sturza, a businessman, as prime minister on December 21. Under Moldovas Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in Moldovas parliament are The Alliance for European Moldova, formed by the Liberal Democrat Party Moldova, PLDM, which supports Sturza, and the Democratic Party Moldova, PD, which is against him. Last week, PD and 14 MPs who left the Communist Party last week formed the Social Democratic Platform, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. Moldova remained without a government at the end of October, when the cabinet led by PLDM vice-president Valeriu Strelet collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the Moldovan Republic Party, PSRM, and the Communist Party, PCRM, who accused Strelet of abuse of power and corruption. The no-confidence vote came after on October 15, Vlad Filat, leader of PLDM and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished Southeast European country. At the end of November, Moldovas Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.Sturza served as prime minister of Moldova for nine months in 1999, appointed by then president Petru Lucinschi. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol Moldova in 2002 and Fribourg Capital Investment Fund. According to Top 300 The Wealthiest Men in Romania 2015 published by Romanias Capital magazine, Sturza has a fortune of 38-40 million euro."
]' \
http://10.50.30.12:8080/services/ner/replaceEntities

The above command returns a modified text string:

"String text = BUCHAREST (LOCATION), December 28 (SeeNews) - LOCATION designated
 prime-minister PERSON said that he will present the members of his cabinet and
 his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements
 on the assets they currently own, news agency ORGANIZATION quoted PERSON as saying after a discussion with parliament president PERSON. PERSON also said he will ask the ORGANIZATION - a body of nine members representing the main political formations in LOCATION
 parliament - to set the date for a vote of confidence. On Thursday PERSON said
 he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. LOCATION president PERSON nominated PERSON, a businessman, as prime minister on December 21. Under LOCATION Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in LOCATION
 parliament are The LOCATION for ORGANIZATION, formed by the ORGANIZATION, ORGANIZATION, which supports PERSON, and the ORGANIZATION, ORGANIZATION, which is against him. Last week, ORGANIZATION and 14 MPs who left the ORGANIZATION last week formed the ORGANIZATION, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. LOCATION remained without a government at the end of October, when the cabinet led by ORGANIZATION vice-president PERSON collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the LOCATION Republic Party, ORGANIZATION, and the ORGANIZATION, ORGANIZATION, who accused PERSON of abuse of power and corruption. The no-confidence vote came after on October 15, PERSON, leader of ORGANIZATION and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished MISCELLANEOUS country. At the end of November, LOCATION Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.PERSON served as prime minister of LOCATION for nine months in 1999, appointed by then president PERSON. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol LOCATION in 2002 and ORGANIZATION. According to Top 300 MISCELLANEOUS in LOCATION 2015 published by LOCATIONs Capital magazine, PERSON has a fortune of 38-40 million euro."

Returns the given input text with entity type instead of entity name.

HTTP POST Request

http://10.50.30.12:8080/services/ner/replaceEntities

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be replaced with their TYPE.

Remove Entities

In the following snippet, the language is english

curl -X POST -d \
'json=
[
    "en",
    "String text = BUCHAREST (Romania), December 28 (SeeNews) - Moldovas designated prime-minister A. Sturza said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements on the assets they currently own, news agency Moldpress quoted Sturza as saying after a discussion with parliament president Andrian Candu. Sturza also said he will ask the Permanent Bureau - a body of nine members representing the main political formations in Moldovas parliament - to set the date for a vote of confidence. On Thursday Sturza said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. Moldovan president Nicolae Timofti nominated Sturza, a businessman, as prime minister on December 21. Under Moldovas Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in Moldovas parliament are The Alliance for European Moldova, formed by the Liberal Democrat Party Moldova, PLDM, which supports Sturza, and the Democratic Party Moldova, PD, which is against him. Last week, PD and 14 MPs who left the Communist Party last week formed the Social Democratic Platform, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. Moldova remained without a government at the end of October, when the cabinet led by PLDM vice-president Valeriu Strelet collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the Moldovan Republic Party, PSRM, and the Communist Party, PCRM, who accused Strelet of abuse of power and corruption. The no-confidence vote came after on October 15, Vlad Filat, leader of PLDM and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished Southeast European country. At the end of November, Moldovas Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.Sturza served as prime minister of Moldova for nine months in 1999, appointed by then president Petru Lucinschi. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol Moldova in 2002 and Fribourg Capital Investment Fund. According to Top 300 The Wealthiest Men in Romania 2015 published by Romanias Capital magazine, Sturza has a fortune of 38-40 million euro."
]' \
http://10.50.30.12:8080/services/ner/removeEntities

The above command returns message like this:

"String text = BUCHAREST (), December 28 (SeeNews) -  designated prime-minister
 said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday
 morning, and on Sunday they will sign their statements on the assets they currently own, news agency  quoted  as saying after a discussion with parliament president .  also said he will ask the  - a body of nine members representing the main political formations in
 parliament - to set the date for a vote of confidence. On Thursday  said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick
 to the planned date.  president  nominated , a businessman, as prime minister on December 21. Under  Constitution, the countrys parliament should vote on the nomination within
 15 days. The main political formations in  parliament are The  for , formed by
 the , , which supports , and the , , which is against him. Last week,  and 14 MPs who left the  last week formed the , which now has 34 deputies out of a total of 101 and claims the prime ministers seat.  remained without a government at the end of October, when the cabinet led by  vice-president  collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the  Republic Party, , and the , , who accused  of abuse of power and corruption. The no-confidence vote came after on October 15, , leader of  and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished  country. At the end of November,  Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29. served as prime minister of  for nine months in 1999, appointed by then president . After his resignation as prime minister he returned to business and founded several companies, including Rompetrol  in 2002 and . According to Top 300  in  2015 published by s Capital magazine,  has a fortune of 38-40 million euro."

Returns the given input text with all entities removed.

HTTP POST Request

http://10.50.30.12:8080/services/ner/removeEntities

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be removed.

Document similarity

Estimates the degree of similarity between texts. Usually documents treated as similar if they are semantically close and describe similar concepts. On other hand “similarity” can be used in context of duplicate detection. Identrics document similarity service is doing more then similarity estimation. In more general prospective it is document repository system. Its intended for storage, retrieval, indexing and search of text documents.

Create context

curl -X POST -d \
'json=
[
    "stare-en",
    "en"
]' \
http://10.50.30.17:8080/services/docsim/createContext

The above command returns message:

To create a context for grouping documents (IDs) run the request:

HTTP POST Request

http://10.50.30.17:8080/services/docsim/createContext

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
language Language used for applying proper analyzer to in order to provide better similarity

Response Properties

Property Description
message Status message.

Add document

curl -X POST -d \
'json=
[
    "stare-en",
    "0000001",
    "The body of new text document"
]' \
http://10.50.30.17:8080/services/docsim/add

The above command returns message:

To add contents of single document with unique ID

HTTP POST Request

http://10.50.30.17:8080/services/docsim/add

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.
text Text of the document to be added.

Response Properties

Property Description
message Status message.

Update document

curl -X POST -d \
'json=
[
    "stare-en",
    "0000001",
    "The updated body of an already existing document"
]' \
http://10.50.30.17:8080/services/docsim/update

The above command returns message:

To update contents of single document with unique ID

HTTP POST Request

http://10.50.30.17:8080/services/docsim/update

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.
text Text of the document to be updated.

Response Properties

Property Description
message Status message.

Delete document

curl -X POST -d \
'json=
[
    "stare-en",
    "0000001"
]' \
http://10.50.30.17:8080/services/docsim/delete

The above command returns message:

To delete single document with unique ID

HTTP POST Request

http://10.50.30.17:8080/services/docsim/delete

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Description
message Status message.

Build vectore store

curl -X POST -d \
'json=
[
    "stare-en"
]' \
http://10.50.30.17:8080/services/docsim/buildVectoreStore

The above command returns message:

To build vectore store and measure the distance between the already added documents.

HTTP POST Request

http://10.50.30.17:8080/services/docsim/buildVectoreStore

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.

Response Properties

Property Description
message Status message.

Get similar documents

curl -X POST -d \
'json=
[
    "stare-en",
    "159494598"
]' \
http://10.50.30.17:8080/services/docsim/getSimilar

The above command returns JSON annotations structured like this:


[
    {
        docId: "157423763",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160949427",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "158611878",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "156433649",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160293683",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    }
]

To get list of similar documents (IDs) run the request:

HTTP POST Request

http://10.50.30.17:8080/services/docsim/getSimilar

In this case the request is for document already indexed in the store.

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId The unique identifier of similar document.
annotationClass null The class label as defined in classifier model.
annotationType docsim The type of the annotation according to the business task.

Get document

curl -X POST -d \
'json=
[
    "stare-en",
    "159494598"
]' \
http://10.50.30.17:8080/services/docsim/getDocument

The above command returns JSON annotations structured like this:

{
    rawBody: "FELONY ARRESTS ESCAMBIA COUNTY The following suspects were charged with felonies Tuesday at Escambia County Jail. Names, ages and addresses were provided by the individuals. Vanessa Ball , 24, address unavailable, resisting an officer. Tommy Wayne Barrows , 40, 200 block of West Detroit Avenue, larceny, fraud. David Alan Blanchford , 37, address unavailable, two counts of larceny.    Wendy Michele Caraway , 41, 9000 block of North Century Boulevard, Century, marijuana possession, ...",
    lastModifiedDate: "2018-01-25T00:47:36.602Z",
    creationDate: "2018-01-25T00:47:36.602Z",
    publicationDate: "2018-01-24T16:44:45.000Z",
    body: "feloni arrest escambia counti suspect charg feloni escambia counti jail name ag address provid individu vanessa ball address unavail resist offic tommi wayn barrow block west detroit avenu larceni fraud david alan blanchford address unavail count larceni wendi michel carawai block north centuri boulevard centuri marijuana possess smuggl contraband daniel ford address unavail move traffic violat marijuana possess uylessi limain foster davi address unavail flee elud polic timothi shawn frazier address unavail count larceni forgeri melani ann gill address unavail larceni angi mari gunslei block atlanta avenu larceni desmond henderson address unavail flee elud polic paul thien hoang address unavail marijuana possess drug equip possess troi lee jackson block cobb lane aggrav assault davariu lamar johnson block south edgewood circl move traffic violat resist offic cocain possess marijuana possess ...",
    title: "Escambia and Santa Rosa felony and DUI arrests for Tuesday, Jan. 23"
}

To get contents of single document by ID

HTTP POST Request

http://10.50.30.17:8080/services/docsim/getDocument

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Description
rawBody Original form of the document that is retrieved from original source.
lastModifiedDate Date of last modification of the content of the document.
creationDate The date creation of the document inside the internal index.
publicationDate The date of publication of document in the original source.
body Stemmed form of the document. This form is used for similarity analysis.
title Title of the document
curl -X POST -d \
'json=
[
    "stare-en",
    "West Michigan Avenue, three counts of trespassing, two counts of moving traffic violation. Lakendal Lashire Wilson , 40, 9600 block of North Palafox Street, moving traffic violation. SANTA ROSA COUNTY The following suspects were charged with felonies Tuesday at Santa Rosa County Jail."
]' \
http://10.50.30.17:8080/services/docsim/similaritySearch

The above command returns JSON annotations structured like this:

[
    {
        docId: "155813544",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "158611878",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "159141386",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "156165591",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "155203615",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    }
]

To get list of similar documents for text input. In this case search string is arbitray and it's not expected to be part of any of indexed documents.

HTTP POST Request

http://10.50.30.17:8080/services/docsim/similaritySearch

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
text The text to be used as search string.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId The unique identifier of similar document.
annotationClass null The class label as defined in classifier model.
annotationType docsim The type of the annotation according to the business task.

Data Loader

Data loader is ETL (Extract, transform, load) task used for providing consistent textual data for the Staging Repository and the Document Similarity services by communicating with ADP elastic index. The data is stored into a lucene index, semantic vector store and a graph database. The metadata is stored into the graph databse, the content of the documents is stored into a lucene index and is later queried by the classifier service. The Data Loader depends on the Text Cleaner and Language Detection services to provide a higher level of data consistency.

Get Elastic Document

In the following snippet, the language is chinese

curl -X POST -d \
'json=
[
    "elasticprod",
    "257601622",
    "item_id"
]' \
http://10.50.30.12:8082/dataloader/getElasticDocument

The above command returns JSON structured like this:

{
    creationDate: "2018-08-31T10:44:31.403Z",
    id: "257601622",
    documentLanguage: "ja",
    annotationSource: null,
    sourceTypeIsSocial: "false",
    sourceTypeName: "Traditional Media",
    url: "https://www.asahi.com/articles/ASL806227L80UHBI01T.html?ref=rss",
    publicDate: "2018-08-31T10:44:24.618Z",
    updateDate: "2018-08-31T10:44:31.403Z",
    sourceName: "asahi.com",
    body: " 日本でも人気の清涼飲料「エナジードリンク」について、英政府はイングランド地域での未成年への販売を禁止する方針を明らかにした。対象年齢を16歳までとするか18歳までかなどについて、意見を11月まで公募し、制度設計を進める。  エナジードリンクは砂糖やカフェインを多く含む。大量に飲んだ場合、肥満や睡眠障害など健康に影響が出ると指摘されている。  政府案では、販売禁止の対象を…",
    title: "エナジードリンク、未成年への販売禁止へ 英政府"
}

Returns document origin related metadata and content(loosely cleaned).

HTTP POST Request

http://10.50.30.12:8082/dataloader/getElasticDocument

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
field value Lucene lookup field value.
field type Lucene lookup field type.

Response Properties

Property Default Description
creationDate null Document cration date.
id null Document unique identirfier.
documentLanguage null Document language property.
annotationSource null Document source property.
sourceTypeIsSocial null Document social media boolean property..
sourceTypeName null Document source type name property.
url null Document url property.
publicDate null Document publication date.
updateDate null Document last updated date.
sourceName null Document source name property.
body null Document content/text.
title null Document title.

Apply NER to elastic document

In the following snippet, the language is chinese

curl -X POST -d \
'json=
[
    "elasticprod",
    "zh-cn",
    "257601622",
    "item_id"
]' \
http://10.50.30.12:8082/dataloader/applyNerToElasticDocument

The above command returns JSON structured like this same as getAnnotation:

[
    {
        word: "英政府",
        annotationURI: null,
        documentId: null,
        index: 12,
        annotationSource: null,
        startOffset: 27,
        endOffset: 30,
        annotationClass: "PERSON",
        annotationId: null,
        annotationType: "ner"
    },
    {
        word: "16",
        annotationURI: null,
        documentId: null,
        index: 3,
        annotationSource: null,
        startOffset: 69,
        endOffset: 71,
        annotationClass: "NUMBER",
        annotationId: null,
        annotationType: "ner"
    },
    {
        word: "11月",
        annotationURI: null,
        documentId: null,
        index: 11,
        annotationSource: null,
        startOffset: 94,
        endOffset: 97,
        annotationClass: "DATE",
        annotationId: null,
        annotationType: "ner"
    }
]

Applies loose clean text-cleaner pipeline to an elastic document content and calls getAnnotations. Returns the document NER tags.

HTTP POST Request

http://10.50.30.12:8082/dataloader/applyNerToElasticDocument

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
language Language corresponding to the NER context name(en, bg, fr). IF null langDetect is used.
field value Lucene lookup field value.
field type Lucene lookup field type.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Create DataSet From Search String

In the following snippet, the language is chinese

curl -X POST -d \
'json=
[
    "stare",
    "source_id:(19278 OR 19276 OR 431077 OR 431269) AND language:\"de\"",
    "Sentiment training dataset for german sources",
    "Contains traditional media documents from german medias"
]' \
http://10.50.30.12:8082/dataloader/createDataSetFromSearchString

The above command returns message structured like this:

Adds and groups documents into training dataset(TDS). Document metadata is extracted from the Elastic index and saved in RDF. The content of the documents is saved into a separate Document Similarity context in order to provide accurate similarity scores for a particular task.

HTTP POST Request

http://10.50.30.12:8082/dataloader/createDataSetFromSearchString

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
Lucene query Detailed syntax description
TDS label Label/name of training data set.
TDS definition Definition of training data set.

Annotaions

Annotaion service is central point for invocation of all other prediction services, such as Classifier, NER and Docsim. It is intended to make multiple predictions just with one HTTP request. Also it manages annotation layer of Identrics ML workflow. Annotation objects are associated with metadata about the content to be analysed. All mentions, positions, types and everything which can be stated about particular document actually are annotations about that document.

Annotate document

curl -X POST -d \
'json=
[
    "demo",
    "0000001",
    "Police are looking a man who has been charged with a fatal stabbing Sunday morning in Pisgah View Apartments. He should be considered armed and dangerous, APD said. Police responded to the complex at 10:32 a.m. after reports of a stabbing, according to APD spokeswoman Christina Hallingse. Officers found 39-year-old Justin Paul Digiacomo, an Asheville resident since 2008, with a wound to the upper torso. Digiacomo died on the scene. Police took out an arrest warrant for Cecil Thorpe, 53, accusing him of second-degree murder, Hallingse said in a press release Sunday evening. Police consider Thorpe to be armed and dangerous. He is described as 5-foot-7 and 150 pounds, with brown eyes, salt-and-pepper colored hair and a beard. Neither Digiacomo nor Thorpe had established residences in Pisgah View but were known to stay there from time to time, Hallingse said. APD asks for anyone with information about the incident or knowledge of Thorpe%27s whereabouts to call police at 828-252-1110 or Crime Stoppers at 828-255-5050."
]' \
http://10.50.30.23:8080/services/annotation/annotate

The above command returns JSON annotations structured like this:

[
    {
        documentId: "0000001",
        annotationClass: "CORMAN",
        annotationSource: "fibep_cat_en_Traditional_media_RAkELd-51",
        annotationId: "0000001_CORMAN",
        annotationType: "taxonomy"
    },
    {
        documentId: "0000001",
        annotationClass: "0",
        annotationSource: "s_en_Traditional_media_Bussiness_SMO-762",
        annotationId: "0000001_0",
        annotationType: "sentiment"
    },
    {
        annotationURI: null,
        endOffset: 158,
        documentId: "0000001",
        index: 8,
        word: "APD",
        startOffset: 155,
        annotationClass: "ORGANIZATION",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_APD_155",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 288,
        documentId: "0000001",
        index: 19,
        word: "Christina Hallingse",
        startOffset: 269,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Christina_Hallingse_269",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 338,
        documentId: "0000001",
        index: 5,
        word: "Justin Paul Digiacomo",
        startOffset: 317,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Justin_Paul_Digiacomo_317",
        annotationType: "ner"
    }
]

Annotating single document

HTTP POST Request

http://10.50.30.23:8080/services/annotation/annotate

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.
text The text to be used as search string.
task list The list of prediction services tasks to be executed.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Get annotations

curl -X POST -d \
'json=
[
    "demo",
    "0000001"
]' \
http://10.50.30.23:8080/services/annotation/getAnnotations

The above command returns JSON annotations structured like this:

[
    {
        documentId: "0000001",
        annotationClass: "CORMAN",
        annotationSource: "fibep_cat_en_Traditional_media_RAkELd-51",
        annotationId: "0000001_CORMAN",
        annotationType: "taxonomy"
    },
    {
        documentId: "0000001",
        annotationClass: "0",
        annotationSource: "s_en_Traditional_media_Bussiness_SMO-762",
        annotationId: "0000001_0",
        annotationType: "sentiment"
    },
    {
        annotationURI: null,
        endOffset: 158,
        documentId: "0000001",
        index: 8,
        word: "APD",
        startOffset: 155,
        annotationClass: "ORGANIZATION",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_APD_155",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 288,
        documentId: "0000001",
        index: 19,
        word: "Christina Hallingse",
        startOffset: 269,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Christina_Hallingse_269",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 338,
        documentId: "0000001",
        index: 5,
        word: "Justin Paul Digiacomo",
        startOffset: 317,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Justin_Paul_Digiacomo_317",
        annotationType: "ner"
    }
]

Getting stored annotations for document

HTTP POST Request

http://10.50.30.23:8080/services/annotation/getAnnotations

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Add annotations

curl -X POST -d \
'json=
[
    "demo",
    "0000001",
    [
        {
            "documentId":"0000001",
            "annotationClass":"CORMER",
            "annotationId":"0000001_CORMER",
            "annotationSource":"fibep_cat_en_Traditional_media_RAkELd-51",
            "annotationType":"multilabel"
        },
        {
            "documentId":"0000001",
            "annotationURI":null,
            "endOffset":1522,
            "index":26,
            "word":"Aircraft_factory 558",
            "startOffset":1501,
            "annotationClass":"ORGANIZATION",
            "annotationId":"0000001_Aircraft_factory_558_1501",
            "annotationSource":"en-model-1.ser.gz",
            "annotationType":"ner"
        }
    ]
]' \
http://10.50.30.23:8080/services/annotation/addAnnotations

The above command returns message:

Storing annotations for document

HTTP POST Request

http://10.50.30.23:8080/services/annotation/addAnnotations

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.
annotations List of annotation objects in JSON format.

Annotation properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model or e-mail address of the annotator.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType multiclass The type of the annotation according to the business task.
word Exact span of the text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Errors

The Identrics API uses the following error codes:

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your API key is wrong.
403 Forbidden -- The service requested is hidden for administrators only.
404 Not Found -- The specified service could not be found.
405 Method Not Allowed -- You tried to access a service with an invalid method.
406 Not Acceptable -- You requested a format that isn't json.
410 Gone -- The service requested has been removed from our servers.
429 Too Many Requests -- You're requesting too many requests.
500 Internal Server Error -- We had a problem with our server. Try again later.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.