NAV Navbar
shell

Introduction

Welcome to Identrics Machine learning services documentation. You can use our API to access Identrics API endpoints, which can provide access to various machine learning models and services endpoints.

We have language bindings in curl, Python, Java and JavaScript. You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

Contexts service

Provides information about application contexts loaded in current Trinity instance. In most cases application contexts hold information about machine learning models loaded on that instance. For example application context named "t_bg_Traditional_media_Bussiness_RAkELd-952" corresponds to: Topic categories multi-label classifier for Bulgarian traditional media business content.

Get application contexts

Returns the context metadata

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST http://identrics.net:8080/services/contexts/getApps

The above command returns JSON structured like this:

[
    {
        comment: "You can clone this view to generate custom views.",
        description: "Trinity customization for Classifier for multi-label tagger and sentiment analysis.",
        title: "Classifier service context",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "cat_bg_Traditional_media_80"
    },
    {
        comment: "You can clone this model to generate custom models",
        description: "Sentiment classifier for English traditional media business content",
        title: "Sentiment classifier for English traditional media business content",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "s_en_Traditional_media_Bussiness_SMO-707"
    },
    {
        comment: "You can clone this model to generate custom models",
        description: "Sentiment classifier for Bulgarian social media business content",
        title: "Sentiment classifier for Bulgarian social media busines content",
        author: "Deyan Peychev",
        collections: 0,
        thumbnail: "media/logo.png",
        name: "s_bg_Social_media_Bussines_SMO-92"
    }
]

HTTP Request

http://identrics.net:8080/services/contexts/getApps

Response Properties

Property Description
comment The comment for the application context.
description Description for the application context.
title Title of the application context.
author Author of the application context.
collections The number of data collections asigned to the application contexts.
thumbnail Thumbnail.
name System name of the application context. This name is used in all other API calls to refer to the context in requests.

Get application context names

Returns the context names

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST http://identrics.net:8080/services/contexts/getAppNames

The above command returns JSON array:

[
    "t_bg_Traditional_media_Bussiness_RAkELd-952",
    "classifier",
    "cat_bg_Traditional_media_80",
    "s_en_Traditional_media_Bussiness_SMO-707",
    "s_bg_Social_media_Bussines_SMO-92",
    "t_bg_Traditional media_Automotive_SMO_0.899",
    "t_bg_Social media_Pharma_SMO_0.617",
    "t_bg_Traditional media_Automotive_SMO_0.927",
    "fibep_cat_en_Traditional_media_RAkELd-51",
    "s_en_Traditional_media_Bussiness_SMO-743",
    "t_bg_Social media_Pharma_SMO_0.901",
    "s_en_Traditional_media_Bussiness_SMO-796",
    "nace_bg_Traditional_media_937",
    "ame_en_Traditional_media_SMO_927",
    "fibep_sentiment_en_Traditional_media_DL4J-87",
    "s_bg_Traditional_media_Bussiness_SMO-81",
    "s_en_Traditional_media_Bussiness_SMO-762"
]

If the same command is applied to our NER service address, the response JSON would be:

[
    "en",
    "de",
    "id",
    "bg",
    "nl",
    "fr",
    "tr",
    "zh-cn",
    "es",
    "sv"
]

Response Properties

Property Description
message JSON array with the names of all the context within a given service

HTTP Request

http://identrics.net:8080/services/contexts/getAppNames/

Language Detection

This is a language detection service. It accepts text and returns result with detected language code and score.

Lang Detect

In the following snippet, the language is bulgarian

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "langdetect",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата  quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://identrics.net:8084/services/langdetect/langDetect

The above command returns message with language code sturctured like this:

Returns the language code for a given text.

HTTP POST Request

http://identrics.net:8084/services/langdetect/langDetect

Query Parameters

Parameter Description
context Default context name
text The text from which the language will be extracted.

Response Properties

Property Description
message The message contains language code for given text input

Stemming

In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word.

Stemm

In the following snippet, the language is english

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "It issued a statement in response to a speech by South Korea President Moon Jae-in on Thursday. Meanwhile, early on Friday North Korea test-fired two missiles into the sea off its eastern coast, the South Korean military said. It is the sixth such test in less than a month."
]' \
http://identrics.net:8080/services/stemmer/stem

The above command returns message with stemmed text:

Returns stemmed text for a given text.

HTTP POST Request

http://identrics.net:8080/services/stemming/stem

Query Parameters

Parameter Description
context context name, typically language code
text The input text string on which stemming algorithm will be applied

Response Properties

Property Description
message Stemmed version of the input text string

Ignorelist

This is an example of ignorelist for common words in English

and
or
is
are
last
let
you
yeah

In computing, stop words are words which are filtered out before or after processing of natural language data (text). Stop words are generally the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools avoid removing stop words to support phrase search/classification/vectorization.

Filter

In the following snippet, the language is english

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "It issued a statement in response to a speech by South Korea President Moon Jae-in on Thursday. Meanwhile, early on Friday North Korea test-fired two missiles into the sea off its eastern coast, the South Korean military said. It is the sixth such test in less than a month."
]' \
http://identrics.net:8080/services/ignorelist/filter

The above command returns message with language code sturctured like this:

Returns shorter version of a given text based on the ignorelist used by the given context.

HTTP POST Request

http://identrics.net:8080/services/ignorelist/filter

Query Parameters

Parameter Description
context context name, typically language code
text The input text string on which an ignorelist will be applied

Response Properties

Property Description
message Ignorelisted version of the input text string

Text Cleaner

The quick, easy, web based way to fix and clean up text when copying and pasting between web services. Uses langDetect in order to execute a pipeline constructed of multiple STAGES combined in a strict order. The pipeline can be differently configured depending on the task (special symbols are kept for Sentiment😊, text is stemmed to increase document similarity accuracy). The text-cleaner can be combined with replaceEntities and removeEntities.

Stages Description Table

Here is an example configuration for text-cleaner pipelines.

context.title=Trinity Text Cleaner
context.description=Text formatting pipelines
context.author=Nikola Velichkov
context.comment=poc1
context.thumbnail = media/logo.png

context.strictClean.pipeline = HTML,URLCLEAN,DUPLICATES,LOWERCASE,SPLIT,IGNORELIST,JOIN,SPLIT,SYNONYMS,JOIN,SPLIT,DICTIONARY,JOIN,REMOVE_PUNCTUATION,REMOVE_SYMBOLS,REMOVE_NUMBERS,STEMM,WHITESPACE
context.looseClean.pipeline = HTML,URLCLEAN,DUPLICATES,WHITESPACE
context.checkIRI.pipeline = URLCLEAN

SPLIT = split
SPLIT.mode = split
SPLIT.input = @xfer
SPLIT.output = @xfer

JOIN = split
JOIN.mode = join
JOIN.input = @xfer
JOIN.output = @xfer

STEMM = stemming-stage
STEMM.language = @language
STEMM.input = @xfer
STEMM.output = @xfer

HTML = html
HTML.input = @xfer
HTML.output = @xfer

IGNORELIST = ignorelist-stage
IGNORELIST.language = @language
IGNORELIST.input = @xfer
IGNORELIST.output = @xfer
...
Name Description
SYNONYMS Synonyms are normalized into their cannonical form.
DICTIONARY Rare words removal, only dictionary words are kept.
STEMM Stemming
HTML HTML removal
LOWERCASE Lower casing
REMOVE_PUNCTUATION Punctuation removal
REMOVE_SYMBOLS Special symbols removal
REMOVE_NUMBERS Number removal
DUPLICATES Duplicated sentences removal
URLCLEAN URL removal
QUICKCLEAN Non-alphanumeric removal
WHITESPACE Removes multiple spaces and strips/trims
SPLIT Tokenization
JOIN Detokenization
IGNORELIST Frequent/Rare/Custom words removal

Pipeline definitions of looseClean and strictClean are in the shell section here.

Clean

In the following curl http POST sample the language is Croatian

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "text-cleaner",
    "looseClean",
    "<div class=\"message\"><p>Zagrizi svoju omiljenu pile\u0107u poslasticu kako bi vikend bio jo\u0161 bolji! \uD83D\uDE0A</p>\r\n<p>\u010Cekanje u redu? Putovanje do restorana? Ma nema potrebe, dovoljno je samo par klikova i sti\u017Ee tvoja omiljena KFC hrana!\uD83E\uDD70</p>\r\n<p>Naru\u010Di preko https://dostava.kfc.hr/, putem Glova, Pauze ili Wolta te na aplikaciji KFC Hrvatska!</p></div>"
]' \
http://identrics.net:8083/services/text-cleaner/clean
curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "text-cleaner",
    "strictClean",
    "<div class=\"message\"><p>Zagrizi svoju omiljenu pile\u0107u poslasticu kako bi vikend bio jo\u0161 bolji! \uD83D\uDE0A</p>\r\n<p>\u010Cekanje u redu? Putovanje do restorana? Ma nema potrebe, dovoljno je samo par klikova i sti\u017Ee tvoja omiljena KFC hrana!\uD83E\uDD70</p>\r\n<p>Naru\u010Di preko https://dostava.kfc.hr/, putem Glova, Pauze ili Wolta te na aplikaciji KFC Hrvatska!</p></div>"
]' \
http://identrics.net:8083/services/text-cleaner/clean

HTTP POST Request

http://identrics.net:8083/services/text-cleaner/clean

Query Parameters

Parameter Description
context Default context name
pipeline Name of pipeline context.looseClean.pipeline
text The input text string executed on the pipeline STEMM.input = @xfer

Response Properties

Property Description
message Cleaner version of the input text string

Classification

General purpose of the classifier is to assign (categorical) class labels to particular document. The set of classes can be a list of categories, taxonomy hierarchy, sentiment scale or just boolean true / false. In most cases the set of classes are defined by the client. Identrics classifier supports three different classification tasks, depending on what kind of results representation is expected in prediction output.

Binary classification task

Binary classification is the problem of classifying content into one of two groups of categories (classes). Typical binary classification scenarios can be applied in filtering undesired messages in mailbox “spam versus ham” or to filter out which documents in data set are, or are not relevant to specific topic of interest.

Multi-class classification task

Multi-class classification is the problem of classifying content into one of three or more groups of categories (classes). Typical multi-class classification scenario is sentiment analysis, using three or more fixed classes for positive, neutral and negative mentions of entities in document text or general sentiment for the whole document.

Multi-label classification task

Multi-label classification is the problem of classifying content into set of (one or many) labels. Multi-label classification is a generalization of multi-class classification, which is the single-label problem. In the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Typical multi-label classification scenario can be applied as document tagger with set of categories from client taxonomy or just desired list of tags.

Classify text

In the following curl http POST sample the model name is "t_bg_Traditional media_Automotive_SMO_0.927"

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "t_bg_Traditional media_Automotive_SMO_0.927",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://identrics.net:8085/services/classifier/classify

The above command returns JSON structured like this:

[
    {
        "annotationId" : null,
        "annotationSource" : "t_bg_Traditional media_Automotive_SMO_0.927",
        "documentId" : null,
        "annotationClass" : "http://identrics.net/resource/category/PROSAL",
        "annotationType" : "taxonomy"
    }
]

In the following curl http POST sample the model name is "fibep_sentiment_en_Traditional_media_DL4J"

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "fibep_sentiment_en_Traditional_media_DL4J-87",
    "Saudi Arabia has said it will carry out urgent reprisals as it accused Iran of being behind a late-night cruise missile attack by Houthi rebel fighters on a Saudi international airport that injured 26 people. The Saudi foreign ministry said the Command of Joint Forces of the Coalition promised it \"will take urgent and timely measures to deter these Iranian-backed terrorist Houthi militias\". The attack on Abha airport was condemned across the Middle East and by the US defence department. The Saudi-backed Yemeni government, which has been fighting a four-year civil war against the Houthi rebels, claimed the missile directed at the airport had been supplied by Iran, even claiming Iranian experts were present at the missiles launch. The latest major Trump resignations and firings Read more Iran strongly denies Saudi claims of aiding the Houthi movement. The Houthi rebels insist they have a right to defend themselves from a Saudi directed blockade, and reported an initial Saudi reprisal that hit densely populated areas in the north of the country. Diplomats will fear that the conflict in Yemen is spilling over into the dispute between Washington and Tehran, particularly if the US backs claims that Iran is directing the increasingly sophisticated Houthi attacks deep into Saudi territory. A Houthi military spokesman promised the group would target every airport in Saudi Arabia and that the coming days would reveal \"big surprises\". No fatalities were reported in the airport attack, which hit the arrivals hall, but the number of civilians wounded was the largest in any Houthi attack inside Saudi Arabia. The Houthis al-Masirah satellite news channel said the missile hit its intended target, Abha airport, near the Yemen border, disrupting flights. The rebels have also carried out drone strikes on Saudi oil installations and may have been responsible for recent attacks on oil tankers off the coast of the United Arab Emirates. A UAE-led investigation into the shipping attacks was unable to identify the culprits, but said a state-supported actor was involved. The precise extent to which Iran is providing military assistance to the Houthi movement is a matter of dispute, but UN reports suggest it has provided weaponry. Iran operates through surrogates, but has looked as if it was seeking ways to reduce tensions with the US. The Japanese prime minister, Shinzo Abe, arrived in Tehran on Wednesday, carrying what Iran expects is a message on behalf of Donald Trump that sets out US conditions for direct talks. Iran is threatening to pull out of the 2015 nuclear deal unless the US relaxes economic sanctions that are crippling the the countrys economy. Trump pulled the US out of the deal last year. A spokesman for the Saudi-led coalition, Turki al-Maliki, was quoted by the state-run Al Ekhbariya news channel as saying three women and two children were among those hurt, and that eight people were taken to hospital while 18 sustained minor injuries. A Houthi spokesman, Mohamed Abdel Salam, said the attack was in response to Saudi Arabias \"continued aggression and blockade on Yemen\". Earlier in the week, he said attacks on Saudi airports were \"the best way to break the blockade\" of the airport in Yemens capital, Sanaa, which the rebels overran in late 2014. Tens of thousands of civilians have been killed in the conflict since, relief agencies say."
]' \
http://identrics.net:8085/services/classifier/classify

The above command returns JSON structured like this:

[
    {
        "annotationId" : null,
        "annotationSource" : "fibep_sentiment_en_Traditional_media_DL4J",
        "documentId" : null,
        "annotationClass" : "-1",
        "annotationType" : "sentiment"
    }
]

Returns classes prediction for given machine learning model and text input

HTTP POST Request

http://identrics.net:8085/services/classifier/classify

Query Parameters

Parameter Description
model name The name of machine learning model used for class prediction.
text The text to be classified.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.

Named Entity Recognition

Named Entity Recognition (NER) is information extraction task to locate and classify entities in text. Tipical classes are Person, Organization, Location, Time, Money, Percentages etc. Entity classes are not limited just to those mentioned. Principally they can be customized depending on domain of knowledge and specific needs for information to be discovered. The types of extracted entities depend mostly of training dataset. If manually annotated corpus is designed to extract market products, employee positions or chemical compounds that types should be explicitly marked. In general, any kind of entity types can be specified for extraction if they are manually annotated in training dataset.

Recognize Entities

In the following curl http POST sample the language is bulgarian

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "bg",
    "Тази кампания предхожда началото на продажбите на модела, които започват в Европа от тази пролет. Хечбекът със системата quattro е с турбодвигател имащ пет цилиндъра и обем 2.5 литра. Максималният въртящ момент е 450 Nm."
]' \
http://identrics.net:8082/services/ner/getAnnotations

The above command returns JSON response structured like this:

[
    {
        annotationURI: null,
        index: 12,
        annotationSource: null,
        documentId: null,
        word: "Европа",
        annotationClass: "LOCATION",
        endOffset: 81,
        startOffset: 75,
        annotationId: null,
        annotationType: "ner"
    }
]

Entity types may vary for the different NER models

[
    "PERSON",
    "LOCATION",
    "ORGANIZATION",
    "PRODUCT",
    "MONEY",
    "DATE",
    "NUMBER",
    "MISC",
    "GPE"
]

Returns entities and their classes extracted from text.

HTTP POST Request

http://identrics.net:8082/services/ner/getAnnotations

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be extracted.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType ner The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Supported languages

English, German, Indonesian/Bahasa, Bulgarian, Dutch, French, Turkish, Simplified Chinese, Spanish and Swedish. Language codes correspond to context names

Replace Entities

In the following curl http POST sample the language is english

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "String text = BUCHAREST (Romania), December 28 (SeeNews) - Moldovas designated prime-minister A. Sturza said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements on the assets they currently own, news agency Moldpress quoted Sturza as saying after a discussion with parliament president Andrian Candu. Sturza also said he will ask the Permanent Bureau - a body of nine members representing the main political formations in Moldovas parliament - to set the date for a vote of confidence. On Thursday Sturza said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. Moldovan president Nicolae Timofti nominated Sturza, a businessman, as prime minister on December 21. Under Moldovas Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in Moldovas parliament are The Alliance for European Moldova, formed by the Liberal Democrat Party Moldova, PLDM, which supports Sturza, and the Democratic Party Moldova, PD, which is against him. Last week, PD and 14 MPs who left the Communist Party last week formed the Social Democratic Platform, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. Moldova remained without a government at the end of October, when the cabinet led by PLDM vice-president Valeriu Strelet collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the Moldovan Republic Party, PSRM, and the Communist Party, PCRM, who accused Strelet of abuse of power and corruption. The no-confidence vote came after on October 15, Vlad Filat, leader of PLDM and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished Southeast European country. At the end of November, Moldovas Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.Sturza served as prime minister of Moldova for nine months in 1999, appointed by then president Petru Lucinschi. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol Moldova in 2002 and Fribourg Capital Investment Fund. According to Top 300 The Wealthiest Men in Romania 2015 published by Romanias Capital magazine, Sturza has a fortune of 38-40 million euro."
]' \
http://identrics.net:8082/services/ner/replaceEntities

The above command returns a modified text string:

"String text = BUCHAREST (LOCATION), December 28 (SeeNews) - LOCATION designated
 prime-minister PERSON said that he will present the members of his cabinet and
 his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements
 on the assets they currently own, news agency ORGANIZATION quoted PERSON as saying after a discussion with parliament president PERSON. PERSON also said he will ask the ORGANIZATION - a body of nine members representing the main political formations in LOCATION
 parliament - to set the date for a vote of confidence. On Thursday PERSON said
 he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. LOCATION president PERSON nominated PERSON, a businessman, as prime minister on December 21. Under LOCATION Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in LOCATION
 parliament are The LOCATION for ORGANIZATION, formed by the ORGANIZATION, ORGANIZATION, which supports PERSON, and the ORGANIZATION, ORGANIZATION, which is against him. Last week, ORGANIZATION and 14 MPs who left the ORGANIZATION last week formed the ORGANIZATION, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. LOCATION remained without a government at the end of October, when the cabinet led by ORGANIZATION vice-president PERSON collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the LOCATION Republic Party, ORGANIZATION, and the ORGANIZATION, ORGANIZATION, who accused PERSON of abuse of power and corruption. The no-confidence vote came after on October 15, PERSON, leader of ORGANIZATION and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished MISCELLANEOUS country. At the end of November, LOCATION Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.PERSON served as prime minister of LOCATION for nine months in 1999, appointed by then president PERSON. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol LOCATION in 2002 and ORGANIZATION. According to Top 300 MISCELLANEOUS in LOCATION 2015 published by LOCATIONs Capital magazine, PERSON has a fortune of 38-40 million euro."

Returns the given input text with entity type instead of entity name.

HTTP POST Request

http://identrics.net:8082/services/ner/replaceEntities

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be replaced with their TYPE.

Remove Entities

In the following snippet, the language is english

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "String text = BUCHAREST (Romania), December 28 (SeeNews) - Moldovas designated prime-minister A. Sturza said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday morning, and on Sunday they will sign their statements on the assets they currently own, news agency Moldpress quoted Sturza as saying after a discussion with parliament president Andrian Candu. Sturza also said he will ask the Permanent Bureau - a body of nine members representing the main political formations in Moldovas parliament - to set the date for a vote of confidence. On Thursday Sturza said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick to the planned date. Moldovan president Nicolae Timofti nominated Sturza, a businessman, as prime minister on December 21. Under Moldovas Constitution, the countrys parliament should vote on the nomination within 15 days. The main political formations in Moldovas parliament are The Alliance for European Moldova, formed by the Liberal Democrat Party Moldova, PLDM, which supports Sturza, and the Democratic Party Moldova, PD, which is against him. Last week, PD and 14 MPs who left the Communist Party last week formed the Social Democratic Platform, which now has 34 deputies out of a total of 101 and claims the prime ministers seat. Moldova remained without a government at the end of October, when the cabinet led by PLDM vice-president Valeriu Strelet collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the Moldovan Republic Party, PSRM, and the Communist Party, PCRM, who accused Strelet of abuse of power and corruption. The no-confidence vote came after on October 15, Vlad Filat, leader of PLDM and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished Southeast European country. At the end of November, Moldovas Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29.Sturza served as prime minister of Moldova for nine months in 1999, appointed by then president Petru Lucinschi. After his resignation as prime minister he returned to business and founded several companies, including Rompetrol Moldova in 2002 and Fribourg Capital Investment Fund. According to Top 300 The Wealthiest Men in Romania 2015 published by Romanias Capital magazine, Sturza has a fortune of 38-40 million euro."
]' \
http://identrics.net:8082/services/ner/removeEntities

The above command returns message like this:

"String text = BUCHAREST (), December 28 (SeeNews) -  designated prime-minister
 said that he will present the members of his cabinet and his government programme on January 2, local media reported on Monday. The list of ministers will be ready by Saturday
 morning, and on Sunday they will sign their statements on the assets they currently own, news agency  quoted  as saying after a discussion with parliament president .  also said he will ask the  - a body of nine members representing the main political formations in
 parliament - to set the date for a vote of confidence. On Thursday  said he will ask parliament for a confidence vote on January 4 but it is unclear if he will stick
 to the planned date.  president  nominated , a businessman, as prime minister on December 21. Under  Constitution, the countrys parliament should vote on the nomination within
 15 days. The main political formations in  parliament are The  for , formed by
 the , , which supports , and the , , which is against him. Last week,  and 14 MPs who left the  last week formed the , which now has 34 deputies out of a total of 101 and claims the prime ministers seat.  remained without a government at the end of October, when the cabinet led by  vice-president  collapsed after losing a no-confidence motion. The motion was filed by 42 MPs of the Socialists of the  Republic Party, , and the , , who accused  of abuse of power and corruption. The no-confidence vote came after on October 15, , leader of  and former prime minister, was detained on suspicions of corruption and complicity in a bank fraud that rattled the economy of the impoverished  country. At the end of November,  Constitutional Court ruled that the president can dissolve parliament if no government is formed within three months, i. e. by January 29. served as prime minister of  for nine months in 1999, appointed by then president . After his resignation as prime minister he returned to business and founded several companies, including Rompetrol  in 2002 and . According to Top 300  in  2015 published by s Capital magazine,  has a fortune of 38-40 million euro."

Returns the given input text with all entities removed.

HTTP POST Request

http://identrics.net:8082/services/ner/removeEntities

Query Parameters

Parameter Description
language Natural language code - "en" for English, "fr" for French etc.
text The text from which the entities to be removed.

Document similarity

Estimates the degree of similarity between texts. Usually documents treated as similar if they are semantically close and describe similar concepts. On other hand “similarity” can be used in context of duplicate detection. Identrics document similarity service is doing more then similarity estimation. In more general prospective it is document repository system. Its intended for storage, retrieval, indexing and search of text documents.

Create context

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "en"
]' \
http://identrics.net:8081/services/docsim/createContext

The above command returns message:

To create a context for grouping documents (IDs) run the request:

HTTP POST Request

http://identrics.net:8081/services/docsim/createContext

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
language Language used for applying proper analyzer to in order to provide better similarity

Response Properties

Property Description
message Status message.

Add document

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "0000001",
    "The body of new text document"
]' \
http://identrics.net:8081/services/docsim/add

The above command returns message:

To add contents of single document with unique ID

HTTP POST Request

http://identrics.net:8081/services/docsim/add

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.
text Text of the document to be added.

Response Properties

Property Description
message Status message.

Update document

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "0000001",
    "The updated body of an already existing document"
]' \
http://identrics.net:8081/services/docsim/update

The above command returns message:

To update contents of single document with unique ID

HTTP POST Request

http://identrics.net:8081/services/docsim/update

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.
text Text of the document to be updated.

Response Properties

Property Description
message Status message.

Delete document

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "0000001"
]' \
http://identrics.net:8081/services/docsim/delete

The above command returns message:

To delete single document with unique ID

HTTP POST Request

http://identrics.net:8081/services/docsim/delete

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Description
message Status message.

Build vectore store

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en"
]' \
http://identrics.net:8081/services/docsim/buildVectoreStore

The above command returns message:

To build vectore store and measure the distance between the already added documents.

HTTP POST Request

http://identrics.net:8081/services/docsim/buildVectoreStore

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.

Response Properties

Property Description
message Status message.

Get similar documents [default]

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "159494598"
]' \
http://identrics.net:8081/services/docsim/getSimilar

The above command returns JSON annotations structured like this:


[
    {
        docId: "157423763",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160949427",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "158611878",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "156433649",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160293683",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    }
]

To get list of similar documents (IDs) run the request:

HTTP POST Request

http://identrics.net:8081/services/docsim/getSimilar

In this case the request is for document already indexed in the store.

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId The unique identifier of similar document.
annotationClass null The class label as defined in classifier model.
annotationType docsim The type of the annotation according to the business task.

Get similar documents [similarity threshold]

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "159494598",
    0.8
]' \
http://identrics.net:8081/services/docsim/getSimilar

The above command returns JSON annotations structured like this:


[
    {
        docId: "157423763",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160949427",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "158611878",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "156433649",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "160293683",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    }
]

To get list of similar documents (IDs) run the request:

HTTP POST Request

http://identrics.net:8081/services/docsim/getSimilar

In this case the request is for document already indexed in the store.

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.
minimum similarity Documents with score less than the minimum [0.0 - 1.0] will not be retrieved by the method.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId The unique identifier of similar document.
annotationClass null The class label as defined in classifier model.
annotationType docsim The type of the annotation according to the business task.

Get document

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "159494598"
]' \
http://identrics.net:8081/services/docsim/getDocument

The above command returns JSON annotations structured like this:

{
    rawBody: "FELONY ARRESTS ESCAMBIA COUNTY The following suspects were charged with felonies Tuesday at Escambia County Jail. Names, ages and addresses were provided by the individuals. Vanessa Ball , 24, address unavailable, resisting an officer. Tommy Wayne Barrows , 40, 200 block of West Detroit Avenue, larceny, fraud. David Alan Blanchford , 37, address unavailable, two counts of larceny.    Wendy Michele Caraway , 41, 9000 block of North Century Boulevard, Century, marijuana possession, ...",
    lastModifiedDate: "2018-01-25T00:47:36.602Z",
    creationDate: "2018-01-25T00:47:36.602Z",
    publicationDate: "2018-01-24T16:44:45.000Z",
    body: "feloni arrest escambia counti suspect charg feloni escambia counti jail name ag address provid individu vanessa ball address unavail resist offic tommi wayn barrow block west detroit avenu larceni fraud david alan blanchford address unavail count larceni wendi michel carawai block north centuri boulevard centuri marijuana possess smuggl contraband daniel ford address unavail move traffic violat marijuana possess uylessi limain foster davi address unavail flee elud polic timothi shawn frazier address unavail count larceni forgeri melani ann gill address unavail larceni angi mari gunslei block atlanta avenu larceni desmond henderson address unavail flee elud polic paul thien hoang address unavail marijuana possess drug equip possess troi lee jackson block cobb lane aggrav assault davariu lamar johnson block south edgewood circl move traffic violat resist offic cocain possess marijuana possess ...",
    title: "Escambia and Santa Rosa felony and DUI arrests for Tuesday, Jan. 23"
}

To get contents of single document by ID

HTTP POST Request

http://identrics.net:8081/services/docsim/getDocument

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier of document - already indexed in the store.

Response Properties

Property Description
rawBody Original form of the document that is retrieved from original source.
lastModifiedDate Date of last modification of the content of the document.
creationDate The date creation of the document inside the internal index.
publicationDate The date of publication of document in the original source.
body Stemmed form of the document. This form is used for similarity analysis.
title Title of the document
curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare-en",
    "West Michigan Avenue, three counts of trespassing, two counts of moving traffic violation. Lakendal Lashire Wilson , 40, 9600 block of North Palafox Street, moving traffic violation. SANTA ROSA COUNTY The following suspects were charged with felonies Tuesday at Santa Rosa County Jail."
]' \
http://identrics.net:8081/services/docsim/similaritySearch

The above command returns JSON annotations structured like this:

[
    {
        docId: "155813544",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "158611878",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "159141386",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "156165591",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    },
    {
        docId: "155203615",
        annotationClass: null,
        annotationId: null,
        annotationSource: null,
        annotationType: "docsim"
    }
]

To get list of similar documents for text input. In this case search string is arbitray and it's not expected to be part of any of indexed documents.

HTTP POST Request

http://identrics.net:8081/services/docsim/similaritySearch

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
text The text to be used as search string.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId The unique identifier of similar document.
annotationClass null The class label as defined in classifier model.
annotationType docsim The type of the annotation according to the business task.

Data Loader

Data loader is ETL (Extract, transform, load) task used for providing consistent textual data for the Staging Repository and the Document Similarity services by communicating with ADP elastic index. The data is stored into a lucene index, semantic vector store and a graph database. The metadata is stored into the graph databse, the content of the documents is stored into a lucene index and is later queried by the classifier service. The Data Loader depends on the Text Cleaner and Language Detection services to provide a higher level of data consistency.

Get Elastic Document

In the following snippet, the language is chinese

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "elasticprod",
    "257601622",
    "item_id"
]' \
http://identrics.net:8086/dataloader/getElasticDocument

The above command returns JSON structured like this:

{
    creationDate: "2018-08-31T10:44:31.403Z",
    id: "257601622",
    documentLanguage: "ja",
    annotationSource: null,
    sourceTypeIsSocial: "false",
    sourceTypeName: "Traditional Media",
    url: "https://www.asahi.com/articles/ASL806227L80UHBI01T.html?ref=rss",
    publicDate: "2018-08-31T10:44:24.618Z",
    updateDate: "2018-08-31T10:44:31.403Z",
    sourceName: "asahi.com",
    body: " 日本でも人気の清涼飲料「エナジードリンク」について、英政府はイングランド地域での未成年への販売を禁止する方針を明らかにした。対象年齢を16歳までとするか18歳までかなどについて、意見を11月まで公募し、制度設計を進める。  エナジードリンクは砂糖やカフェインを多く含む。大量に飲んだ場合、肥満や睡眠障害など健康に影響が出ると指摘されている。  政府案では、販売禁止の対象を…",
    title: "エナジードリンク、未成年への販売禁止へ 英政府"
}

Returns document origin related metadata and content(loosely cleaned).

HTTP POST Request

http://identrics.net:8086/dataloader/getElasticDocument

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
field value Lucene lookup field value.
field type Lucene lookup field type.

Response Properties

Property Default Description
creationDate null Document cration date.
id null Document unique identirfier.
documentLanguage null Document language property.
annotationSource null Document source property.
sourceTypeIsSocial null Document social media boolean property..
sourceTypeName null Document source type name property.
url null Document url property.
publicDate null Document publication date.
updateDate null Document last updated date.
sourceName null Document source name property.
body null Document content/text.
title null Document title.
curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "elasticprod",
    "source_id:(425773)",
    "2020-03-01T22:00:00.000Z",
    "2020-03-30T22:00:00.000Z"
]' \
http://identrics.net:8086/services/dataloader/search

The above command returns message sturctured like this:

[
    {
        creationDate: "2020-03-23T21:49:35.732Z",
        url: "https://www.ardrossanherald.com/news/18317657.no-ball-games-signs-north-ayrshire-expected-taken/?ref=rss",
        sourceName: "ardrossanherald.com - RDCAM",
        body: "Council chiefs are expected to agree to take down all ‘No Ball Games’ signs. Cabinet are set to agree to remove the signs – with council chiefs admitting the signs are not legally enforceable and expressing doubt regarding how effective they are in deterring antisocial behaviour. Cabinet are expected to approve the removal of all current ‘No Ball Games’ signs and that no further signs in North Ayrshire are approved. Meeting papers state: “In September 2019, footballer Kris Boyd cited ‘No Ball Games’ signs as one of the reasons that this generation of children no longer play football within their communities. “‘No Ball Games’ signage has been used in an attempt to discourage certain types of low-level antisocial behaviour. A procedure and an approach that considers requests from residents for ‘No Ball Games’ signs within residential areas is also in place.” “There is also some concern regarding the negative impact these signs can have on children exercising their right to play. “Removing ‘No Ball Games’ signs would send an important message to our communities that the council are committed to supporting the right of North Ayrshire’s young people to be able to play within their own estates.”",
        title: "‘No Ball Games’ signs in North Ayrshire expected to be taken down",
        annotationSource: null,
        sourceTypeIsSocial: "false",
        sourceTypeName: "Traditional Media",
        publicDate: "2020-03-23T20:00:00.000Z",
        updateDate: "2020-04-08T22:19:29.354Z",
        documentLanguage: "en",
        id: "771491293"
    },
    {
        creationDate: "2020-03-23T13:53:34.065Z",
        url: "https://www.ardrossanherald.com/news/18326785.two-cars-set-fire-ardrossan-weekend/?ref=rss",
        sourceName: "ardrossanherald.com - RDCAM",
        body: "Police were called to two separate incidents of cars being set alight in Ardrossan over the weekend. The incidents took place at10:46pm on Saturday, March 21 on Glasgow Street and then at 11:26pm on Sunday, March 22 onMontgomerie Street. The incidents are being treated as suspicious. Both blazes were extinguished by the Scottish Fire and Rescue Service at the scene. A spokesperson for Irvine Police said: "At 2246hrs on Saturday 21st March 2020 police were called to reports of a white Ford on fire on Glasgow Street, Ardrossan. This was extinguished by SFRS and is being treated as a wilful fireraising. "At 2326hrs on Sunday 22nd March 2020 police were called to an Audi on fire on Montgomerie St., near to Mariners View. SFRS attended and the fire was extinguished and this is also being treated as wilful. "Enquiries are ongoing in relation to these twocrimes."",
        title: "Two cars set on fire in Ardrossan over the weekend",
        annotationSource: null,
        sourceTypeIsSocial: "false",
        sourceTypeName: "Traditional Media",
        publicDate: "2020-03-23T11:10:00.000Z",
        updateDate: "2020-04-08T22:19:29.354Z",
        documentLanguage: "en",
        id: "771003235"
    },
    {
        creationDate: "2020-03-24T13:52:26.314Z",
        url: "https://www.ardrossanherald.com/news/18329910.clyde-garnock-valley-crematorium-offers-free-webcasts-covid-19/?ref=rss",
        sourceName: "ardrossanherald.com - RDCAM",
        body: "A North Ayrshire crematorium is offering free webcasting of services during the COVID-19 pandemic. Clyde Coast and Garnock Valley Crematorium is helping the public follow government advice to avoid gatherings by live streaming their loved one’s final goodbye. A spokesperson for the crematorium, located in Clyde Muirshiel Regional Park, said: “We are doing everything we can to support families and continue to provide a safe space for essential funeral services.” Although all social events, including weddings and baptisms, are to be stopped, funerals can still go ahead. However, social distancing guidance must still be observed. The consequence of this advice is that wider family members and friends of the deceased who would have ordinarily attended funeral services are being asked to consider not attending - to physically stay away - to reduce the number of people congregating. A spokesperson for Clyde Coast and Garnock Valley Crematorium said: “To help families given these difficult and exceptional circumstances, we are offering complimentary webcasts of full services held in our Ceremony Hall. “Families can provide friends with a secure log-in and they can still watch and be part of the funeral service, albeit remotely.” The technology will help families and friends pay their respects during the coronavirus crisis whilst still following government advice to remain at home. The team at Clyde Coast and Garnock Valley Crematorium believe they are the only crematorium to offer this service and are hoping others across the country can do the same, although not all crematoria have the technology to do so. For more information visit or .",
        title: "Clyde and Garnock Valley crematorium offers free webcasts during COVID-19",
        annotationSource: null,
        sourceTypeIsSocial: "false",
        sourceTypeName: "Traditional Media",
        publicDate: "2020-03-24T10:54:41.000Z",
        updateDate: "2020-04-08T22:19:29.354Z",
        documentLanguage: "en",
        id: "772279853"
    },
    .
    .
    .

Returns information about the documents retrieved from the elastic index and the time taken to perform the task. Loading batch of documents to be stored as collection. The collections are used by the buildModel service method that uses the retrieved documents for training LDA model.

HTTP POST Request

http://identrics.net:8086/services/dataloader/search

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
Lucene query Detailed syntax description
startDate Start date of original date (filters Lucene query result).
endDate End date of original date (filters Lucene query result).

Response Properties

Property Description
message Detail information about the task execution.

Apply NER to elastic document

In the following snippet, the language is chinese

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "elasticprod",
    "zh-cn",
    "257601622",
    "item_id"
]' \
http://identrics.net:8086/dataloader/applyNerToElasticDocument

The above command returns JSON structured like this same as getAnnotation:

[
    {
        word: "英政府",
        annotationURI: null,
        documentId: null,
        index: 12,
        annotationSource: null,
        startOffset: 27,
        endOffset: 30,
        annotationClass: "PERSON",
        annotationId: null,
        annotationType: "ner"
    },
    {
        word: "16",
        annotationURI: null,
        documentId: null,
        index: 3,
        annotationSource: null,
        startOffset: 69,
        endOffset: 71,
        annotationClass: "NUMBER",
        annotationId: null,
        annotationType: "ner"
    },
    {
        word: "11月",
        annotationURI: null,
        documentId: null,
        index: 11,
        annotationSource: null,
        startOffset: 94,
        endOffset: 97,
        annotationClass: "DATE",
        annotationId: null,
        annotationType: "ner"
    }
]

Applies loose clean text-cleaner pipeline to an elastic document content and calls getAnnotations. Returns the document NER tags.

HTTP POST Request

http://identrics.net:8086/dataloader/applyNerToElasticDocument

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
language Language corresponding to the NER context name(en, bg, fr). IF null langDetect is used.
field value Lucene lookup field value.
field type Lucene lookup field type.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Create DataSet From Search String

In the following snippet, the language is chinese

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "stare",
    "source_id:(19278 OR 19276 OR 431077 OR 431269) AND language:\"de\"",
    "Sentiment training dataset for german sources",
    "Contains traditional media documents from german medias"
]' \
http://identrics.net:8082/dataloader/createDataSetFromSearchString

The above command returns message structured like this:

Adds and groups documents into training dataset(TDS). Document metadata is extracted from the Elastic index and saved in RDF. The content of the documents is saved into a separate Document Similarity context in order to provide accurate similarity scores for a particular task.

HTTP POST Request

http://identrics.net:8086/dataloader/createDataSetFromSearchString

Query Parameters

Parameter Description
context context are definite and related to project specifics, because of the wide settings variety
Lucene query Detailed syntax description
TDS label Label/name of training data set.
TDS definition Definition of training data set.

Annotaions

Annotaion service is central point for invocation of all other prediction services, such as Classifier, NER and Docsim. It is intended to make multiple predictions just with one HTTP request. Also it manages annotation layer of Identrics ML workflow. Annotation objects are associated with metadata about the content to be analysed. All mentions, positions, types and everything which can be stated about particular document actually are annotations about that document.

Annotate document [automatic lang detection]

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "demo",
    "0000001",
    "Police are looking a man who has been charged with a fatal stabbing Sunday morning in Pisgah View Apartments. He should be considered armed and dangerous, APD said. Police responded to the complex at 10:32 a.m. after reports of a stabbing, according to APD spokeswoman Christina Hallingse. Officers found 39-year-old Justin Paul Digiacomo, an Asheville resident since 2008, with a wound to the upper torso. Digiacomo died on the scene. Police took out an arrest warrant for Cecil Thorpe, 53, accusing him of second-degree murder, Hallingse said in a press release Sunday evening. Police consider Thorpe to be armed and dangerous. He is described as 5-foot-7 and 150 pounds, with brown eyes, salt-and-pepper colored hair and a beard. Neither Digiacomo nor Thorpe had established residences in Pisgah View but were known to stay there from time to time, Hallingse said. APD asks for anyone with information about the incident or knowledge of Thorpe%27s whereabouts to call police at 828-252-1110 or Crime Stoppers at 828-255-5050."
]' \
http://identrics.net:8087/services/annotation/annotate

The above command returns JSON annotations structured like this:

[
    {
        documentId: "0000001",
        annotationClass: "CORMAN",
        annotationSource: "fibep_cat_en_Traditional_media_RAkELd-51",
        annotationId: "0000001_CORMAN",
        annotationType: "taxonomy"
    },
    {
        documentId: "0000001",
        annotationClass: "0",
        annotationSource: "s_en_Traditional_media_Bussiness_SMO-762",
        annotationId: "0000001_0",
        annotationType: "sentiment"
    },
    {
        annotationURI: null,
        endOffset: 158,
        documentId: "0000001",
        index: 8,
        word: "APD",
        startOffset: 155,
        annotationClass: "ORGANIZATION",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_APD_155",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 288,
        documentId: "0000001",
        index: 19,
        word: "Christina Hallingse",
        startOffset: 269,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Christina_Hallingse_269",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 338,
        documentId: "0000001",
        index: 5,
        word: "Justin Paul Digiacomo",
        startOffset: 317,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Justin_Paul_Digiacomo_317",
        annotationType: "ner"
    }
]

Annotating single document

HTTP POST Request

http://identrics.net:8087/services/annotation/annotate

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.
text The text to be used as search string.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Annotate document [language parameter]

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "demo",
    "0000001",
    "Police are looking a man who has been charged with a fatal stabbing Sunday morning in Pisgah View Apartments. He should be considered armed and dangerous, APD said. Police responded to the complex at 10:32 a.m. after reports of a stabbing, according to APD spokeswoman Christina Hallingse. Officers found 39-year-old Justin Paul Digiacomo, an Asheville resident since 2008, with a wound to the upper torso. Digiacomo died on the scene. Police took out an arrest warrant for Cecil Thorpe, 53, accusing him of second-degree murder, Hallingse said in a press release Sunday evening. Police consider Thorpe to be armed and dangerous. He is described as 5-foot-7 and 150 pounds, with brown eyes, salt-and-pepper colored hair and a beard. Neither Digiacomo nor Thorpe had established residences in Pisgah View but were known to stay there from time to time, Hallingse said. APD asks for anyone with information about the incident or knowledge of Thorpe%27s whereabouts to call police at 828-252-1110 or Crime Stoppers at 828-255-5050.",
    "en"
]' \
http://identrics.net:8087/services/annotation/annotate

The above command returns JSON annotations structured like this:

[
    {
        documentId: "0000001",
        annotationClass: "CORMAN",
        annotationSource: "fibep_cat_en_Traditional_media_RAkELd-51",
        annotationId: "0000001_CORMAN",
        annotationType: "taxonomy"
    },
    {
        documentId: "0000001",
        annotationClass: "0",
        annotationSource: "s_en_Traditional_media_Bussiness_SMO-762",
        annotationId: "0000001_0",
        annotationType: "sentiment"
    },
    {
        annotationURI: null,
        endOffset: 158,
        documentId: "0000001",
        index: 8,
        word: "APD",
        startOffset: 155,
        annotationClass: "ORGANIZATION",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_APD_155",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 288,
        documentId: "0000001",
        index: 19,
        word: "Christina Hallingse",
        startOffset: 269,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Christina_Hallingse_269",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 338,
        documentId: "0000001",
        index: 5,
        word: "Justin Paul Digiacomo",
        startOffset: 317,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Justin_Paul_Digiacomo_317",
        annotationType: "ner"
    }
]

Annotating single document

HTTP POST Request

http://identrics.net:8087/services/annotation/annotate

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.
text The text to be used as search string.
language language of the given document

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Get annotations

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "demo",
    "0000001"
]' \
http://identrics.net:8087/services/annotation/getAnnotations

The above command returns JSON annotations structured like this:

[
    {
        documentId: "0000001",
        annotationClass: "CORMAN",
        annotationSource: "fibep_cat_en_Traditional_media_RAkELd-51",
        annotationId: "0000001_CORMAN",
        annotationType: "taxonomy"
    },
    {
        documentId: "0000001",
        annotationClass: "0",
        annotationSource: "s_en_Traditional_media_Bussiness_SMO-762",
        annotationId: "0000001_0",
        annotationType: "sentiment"
    },
    {
        annotationURI: null,
        endOffset: 158,
        documentId: "0000001",
        index: 8,
        word: "APD",
        startOffset: 155,
        annotationClass: "ORGANIZATION",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_APD_155",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 288,
        documentId: "0000001",
        index: 19,
        word: "Christina Hallingse",
        startOffset: 269,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Christina_Hallingse_269",
        annotationType: "ner"
    },
    {
        annotationURI: null,
        endOffset: 338,
        documentId: "0000001",
        index: 5,
        word: "Justin Paul Digiacomo",
        startOffset: 317,
        annotationClass: "PERSON",
        annotationSource: "en-model-1.ser.gz",
        annotationId: "0000001_Justin_Paul_Digiacomo_317",
        annotationType: "ner"
    }
]

Getting stored annotations for document

HTTP POST Request

http://identrics.net:8087/services/annotation/getAnnotations

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.

Response Properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType taxonomy The type of the annotation according to the business task.
word Exact portion of text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Add annotations

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "demo",
    "0000001",
    [
        {
            "documentId":"0000001",
            "annotationClass":"CORMER",
            "annotationId":"0000001_CORMER",
            "annotationSource":"fibep_cat_en_Traditional_media_RAkELd-51",
            "annotationType":"multilabel"
        },
        {
            "documentId":"0000001",
            "annotationURI":null,
            "endOffset":1522,
            "index":26,
            "word":"Aircraft_factory 558",
            "startOffset":1501,
            "annotationClass":"ORGANIZATION",
            "annotationId":"0000001_Aircraft_factory_558_1501",
            "annotationSource":"en-model-1.ser.gz",
            "annotationType":"ner"
        }
    ]
]' \
http://identrics.net:8087/services/annotation/addAnnotations

The above command returns message:

Storing annotations for document

HTTP POST Request

http://identrics.net:8087/services/annotation/addAnnotations

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
document ID Unique identifier for document.
annotations List of annotation objects in JSON format.

Annotation properties

Property Default Description
annotationId null Annotation unique identifier. This id is populated only by "Annotation" endpoint.
annotationSource null The name of prediction model or e-mail address of the annotator.
documentId null The unique identifier of document.
annotationClass The class label as defined in classifier model.
annotationType multiclass The type of the annotation according to the business task.
word Exact span of the text corresponding to the mention of entity.
index Word position of entity mention inside text.
startOffset Character position of the beginnig of entity mention inside text.
endOffset Character position of the end of entity mention inside text.

Topic-modeler

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.

Load Collection

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "2020-04-03",
    "source_id:(425773)",
    "2020-03-01T22:00:00.000Z",
    "2020-03-30T22:00:00.000Z"
]' \
http://identrics.net:8089/services/topicmodeler/loadData

The above command returns message sturctured like this:

Returns information about the documents retrieved from the elastic index and the time taken to perform the task. Loading batch of documents to be stored as collection. The collections are used by the buildModel service method that uses the retrieved documents for training LDA model.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/loadData

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
Lucene query Detailed syntax description
startDate Start date of original date (filters Lucene query result).
endDate End date of original date (filters Lucene query result).

Response Properties

Property Description
message Detail information about the task execution.

Build Model

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "2020-04-03",
    7,
    50
]' \
http://identrics.net:8089/services/topicmodeler/buildModel

The above command returns message sturctured like this:

Builds a single LDA model using the following library.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/buildModel

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
Number of topics The number of topics that the model will be able to make inference on.
Number of iterations The number of iterations that the model will be trained on.

Response Properties

Property Description
message Detail information about the task execution.

Build HPO Model

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "2020-04-03",
    3,
    25,
    2,
    10,
    50
]' \
http://identrics.net:8089/services/topicmodeler/buildHPOModel

The above command returns message sturctured like this:

Builds multiple LDA models, evaluates their coherence metrics and serializes the best model using the following library.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/buildHPOModel

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
Start number of topics The start number of topics that the model will be able to make inference on.
End number of topics The end number of topics that the model will be able to make inference on.
Step The number of steps is used for skipping some of the K's in the [K_MIN:K_MAX] range. If STEP is 1, then no K is skipped
Number of top words Number of top words that define a topic, takes part in Diagnostics & Representation
Number of iterations Number of itterations applied to each LDA model within the given range

Response Properties

Property Description
message Detail information about the task execution.

Get Topics List

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "2020-04-03"
]' \
http://identrics.net:8089/services/topicmodeler/getTopicsList

The above command returns message sturctured like this:

[
    {
        topicId: 4,
        description: null,
        title: null,
        topWords: [
            "school",
            "service",
            "north",
            "positive",
            "parents",
            "week",
            "place"
        ]
    },
    {
        topicId: 5,
        description: null,
        title: null,
        topWords: [
            "local",
            "locality",
            "ayrshire",
            "glasgow",
            "water",
            "ardrossan",
            "council"
        ]
    },
    {
        topicId: 0,
        description: null,
        title: null,
        topWords: [
            "ayrshire",
            "coronavirus",
            "covid",
            "people",
            "nhs",
            "scotland",
            "advice"
        ]
    },
    {
        topicId: 3,
        description: null,
        title: null,
        topWords: [
            "community",
            "people",
            "kilwinning",
            "support",
            "club",
            "residents",
            "group"
        ]
    },
    {
        topicId: 1,
        description: null,
        title: null,
        topWords: [
            "claire",
            "north",
            "family",
            "life",
            "ahead",
            "kidney",
            "forward"
        ]
    },
    {
        topicId: 2,
        description: null,
        title: null,
        topWords: [
            "cases",
            "hospital",
            "scotland",
            "patients",
            "total",
            "confirmed",
            "virus"
        ]
    },
    {
        topicId: 6,
        description: null,
        title: null,
        topWords: [
            "people",
            "health",
            "mental",
            "day",
            "support",
            "call",
            "saltcoats"
        ]
    }
]

Gets the list of topics created after the buildModel service method execution.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/getTopicsList

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.

Response Properties

Property Default Description
topicId auto incremented topic id.
topWords array of top words for the topic.
title null curated topWords by user [To Be Implemented]
description null curated description by user [To Be Implemented]

Get Topics Distribution

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "2020-04-03"
]' \
http://identrics.net:8089/services/topicmodeler/getTopicsList

The above command returns message sturctured like this:

[
    {
        docId: "771491293",
        probability: 0.41203007518796997,
        annotationId: "771491293_4",
        annotationSource: "en_2020-04-03",
        annotationClass: "4",
        annotationType: "topic"
    },
    {
        docId: "771491293",
        probability: 0.443609022556391,
        annotationId: "771491293_5",
        annotationSource: "en_2020-04-03",
        annotationClass: "5",
        annotationType: "topic"
    },
    {
        docId: "771003235",
        probability: 0.1474654377880184,
        annotationId: "771003235_0",
        annotationSource: "en_2020-04-03",
        annotationClass: "0",
        annotationType: "topic"
    },
    .
    .
    .

Retrieves all documents with their corresponding topic id.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/getTopicsDistribution

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.

Response Properties

Property Default Description
docId auto incremented topic id.
topWords array of top words for the topic.
title null curated topWords by user [To Be Implemented]
description null curated description by user [To Be Implemented]

Inference

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "en",
    "666",
    "2020-04-03",
    "Man arrested following four-hour standoff in southeast Fort Collins One man is in custody following an hourslong standoff in southeast Fort Collins on Sunday night. Michael Aguirre Jr., 42, was arrested on suspicion of false imprisonment, a Class 5 felony; menacing, a Class 5 felony; child abuse, a Class 2 misdemeanor; and domestic violence. About 5:30 p.m. Sunday, Fort Collins police responded to the report of a domestic disturbance on Antigua Drive, southeast of the Lemay Avenue and Trilby Road intersection. When officers arrived, they learned that Aguirre was armed with a gun inside his residence, according to a police news release. Police said everyone else had already exited the home by the time police arrived. Aguirre refused to acknowledge officers or comply with commands to exit the apartment, police said, and the SWAT team responded to assist. Story continues below the photo. Neighbor Nicole Cagle told the Coloradoan on Sunday that she called police when her neighbor's young son came to her house. The boy told her that Aguirre had a gun and wouldn't let his mom come out of the bathroom. The mother made the boy leave and he came to Cagle asking that she call police, she said. Cagle said the mother was able to get out of the house, but Aguirre wouldn't leave and was waving a gun around in the windows. Aguirre was taken into custody after approximately four hours, police said. Aguirre doesn't have a significant criminal history in Colorado, according to court records. Anyone with information, who hasn't already spoken to police, may contact Detective Annie Hill at 970-221-6340 or Crime Stoppers of Larimer County at 970-221-6868 . All suspects are innocent until proven guilty in court. Arrests and charges are merely accusations by law enforcement until, and unless, a suspect is convicted of a crime"
]' \
http://identrics.net:8089/services/topicmodeler/inference

The above command returns message sturctured like this:

[
    {
        docId: "666",
        probability: 0.12332730560578653,
        annotationId: "666_0",
        annotationSource: "en_2020-04-03",
        annotationClass: "0",
        annotationType: "topic"
    },
    {
        docId: "666",
        probability: 0.1663652802893308,
        annotationId: "666_2",
        annotationSource: "en_2020-04-03",
        annotationClass: "2",
        annotationType: "topic"
    },
    {
        docId: "666",
        probability: 0.12332730560578653,
        annotationId: "666_4",
        annotationSource: "en_2020-04-03",
        annotationClass: "4",
        annotationType: "topic"
    },
    {
        docId: "666",
        probability: 0.38155515370705223,
        annotationId: "666_5",
        annotationSource: "en_2020-04-03",
        annotationClass: "5",
        annotationType: "topic"
    }
]

Returns the predicted topics for the given text.

HTTP POST Request

http://identrics.net:8089/services/topicmodeler/inference

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
collection name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
doc id Document id of the provided text.
text Text that infrence will be applied on.

Response Properties

Property Description
docId Document Identifier
probability The max probability value is 1 which means that 100% the document belongs to a particular topic.
title Title if such is manually created by the user.
annotationId Combines document id and topic id.
annotationSource Combines context and collection
annotationClass Topic identifier
annotationType The type of the annotation. In this case it is topic

Pipeline

Pipelines organizes the pieces of processing logic called stages into powerful sequences, able to accomplish various tasks. The stages might be independent implementations or higher level wrapped existing services. In the curent example the "filter" pipeline abstraction is the following: create context -> search -> annotate -> docsim add -> build vectorestore -> get similar -> export to excel.

Execute pipeline

curl -H "X-Trinity-Apikey: YOUR_TOKEN_HERE" -X POST -d \
'json=
[
    "rdc",
    "filter",
    [
        ["language","zh-tw"],
        ["similarityScore","0.8"],
        ["luceneQuery","source_id:(445689)"],
        ["luceneQueryName","RDC_ZH-TW_QUERY"],
        ["startDate","2020-09-08T21:00:00.000Z"],
        ["endDate","2020-09-08T23:59:59.999Z"]
    ]
]' \
http://identrics.net:8088/services/pipeline/executePipeline

The above command returns message sturctured like this:

Returns information about the time taken to perform the given pipeline.

HTTP POST Request

http://identrics.net:8088/services/pipeline/executePipeline

Query Parameters

Parameter Description
context name Context application name. Usually the name of project.
pipeline name Collection name. Usually the mapping that corresponds to particular search string executed on a date/time.
pipeline parameters The parameters vary depending on the pipeline definition.

Response Properties

Property Description
message Detail information about the task execution.

Errors

The Identrics API uses the following error codes:

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your API key is wrong.
403 Forbidden -- The service requested is hidden for administrators only.
404 Not Found -- The specified service could not be found.
405 Method Not Allowed -- You tried to access a service with an invalid method.
406 Not Acceptable -- You requested a format that isn't json.
410 Gone -- The service requested has been removed from our servers.
429 Too Many Requests -- You're requesting too many requests.
500 Internal Server Error -- We had a problem with our server. Try again later.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.