search

This API provides services for searching the Haufe Content for documents and retrieving them.


API Settings

Help


Note: This API only supports the Client Credentials flow. This means that you will log in only using your application's client ID and client secret, without actually authenticating a user. In effect, any /token end point of the below Authentication Method(s) can be used for that.


Log in using a local username and password.

Token Endpoint

This API supports the following OAuth2 authorization flows:

Quickstart

The first thing you need is a contact person at Haufe-Lexware (e.g. _HL_Panama@haufe-lexware.com) that knows your use-case for ContentHub and can decide which scope is the right one for you (see below).

Once this is clear you need to

  1. Sign up for the portal
  2. Register your application
    1. Leave the checkbox for OAuth2.0 Flows unchecked
    2. In case you are using a Server side application, make sure to select "Confidential: Server side application" from the Client Type dropdown
  3. Subscribe to the search API by clicking on the "Subscribe" button at the end of this page
    1. Leave the checkbox "Trust this application" unchecked
    2. Choose the trial or unlimited plan depending on your needs

After that, the admin at Haufe-Lexware will choose a scope for your use-case and inform you about it. You can also notice the assigned scopes at any time by selecting to view an application under the "Applications" tab, then by pressing on the "Select" button to choose for which subscription you are interested to view the scopes.

Now you are ready to use the API. There are two important URLs:

  1. The token endpoint URL
  2. The API URL

Both of them you can find right above at the top of this page - the token endpoint URL is shown when you unfold the section with the greenish background (entitled "Username and Password (local)").

Now you need to use this command to request a token:

curl --location --request POST '<token-endpoint-url>' \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'grant_type=client_credentials' \
  --data-urlencode 'client_id=...' \
  --data-urlencode 'client_secret=...' \
  --data-urlencode 'scope=...'

The response to this request consists of a small JSON object that contains the token in the access_token property. To make a simple search request you need to pass the token as part of the Authorization header like this:

curl --location --request GET '<api-url>/search?q=dienstreise' \
  --header 'Authorization: Bearer <access-token>'

The q parameter contains the query string, you can find all the details about it in the next sections.

The response contains a list of search results, each entry having an ID in the form of a URI starting with contenthub://. You can retrieve the so-called baseline content, which is an XHTML document, using a request like this:

curl --location --request GET '<api-url>/retrieval/baseline?contentHubId=...' \
  --header 'Authorization: Bearer <access-token>'

Congratulations! Now you have succeeded with the first steps of using the ContentHub!

If you want to know more about the details of searching and retrieving content, just read on...

Searching content

Queries and Search Expressions

Query Parameters

Parameter   Mandatory    Description
q   yes   query – Search expression (syntax described further bellow). The request may contain an arbitrary number of q parameters
field   no   defines weighted field references for use in (full-text) search expressions; see Local Field References
offset   no   Index of the first result to return, allows to page through result lists
limit   no   Maximum number of results to return, allows to page through result lists
preview   no   indicates if the preview element of the original documents appears in the result entries
 

Search Expressions

At the core of any search is a query which indicates the criteria matching documents should satisfy. A query can be as simple as a single word (income) or a (potentially complex) composition of several search expressions (("income" OR "revenue" -(tags:"revoked") title:"\*statement*")).

Search expressions are applied to the fields specified in the search request. If no fields have been specified the default is the document's title and the document's baselineContent and baselineSearchableText.

The list of supported search expressions is as follows:

Word

Just written as a word without any quotation marks. For instance income.

  • words are treated case-insensitive (e.g. Income or INCOME will be found as well)
  • words will be enriched by stemming (e.g. incomes will be found as well)

Phrase

One or many words encapsulated by double quotation marks. For instance: "income tax". Like words:

  • phrases are treated case-insensitive
  • phrase will be enriched by stemming Unlike words:
  • the words in a phrase must appear in the same order in a matched document. E.g. "income tax" will only match documents containing both words income and tax in that sequence.

NOT (Negation)

Denoted by a minus sign - in front of a sub-expression. For instance -tax will match documents not containing the word tax.

AND (Conjunction)

Denoted by whitespace between adjacent search expressions. For instance tax income will only match documents containing both words tax and income. Contrary to the similar Phrase expression the order of both words and the distance between both words in the matching document is not relevant.

OR (Disjunction)

Denoted by the keyword OR between adjacent search expressions. For instance tax OR income will match documents containing either the word tax or the word income, or both.

Group

Denoted by a pair of parenthesis around a sequence of sub-expressions. Expression grouping will mostly happen implicitly according to the Expression Precedence Rules stated below. However, sometimes it will be required to use explicit grouping in order to get the correct result, namely when implicit grouping and precedence would result in an incorrect query.

Compair:

  • based on implicit grouping and precedence the expression tax OR income statement will match documents containing either both words tax and statement or the words income and statement
  • if the intention is to find documents containing either the word tax (whether or not containing the word statement) or both words tax and statement the query would have to be written as tax OR (income statement)

Scope

Denoted by a scope name and a colon : in front of a sub-expression. Examples:

  • documentType:News will match documents having a metadata field documentType with value News, or
  • documentType:(News OR Article) will match documents having a metadata field documentType with a value of either News or Article, or
  • documentType:News application:portals will match documents having a metadata field documentType with the value News and a metadata field application with the value portals.

However, Scope expressions must not be nested, e.g. title:(author:someone) will be flagged as invalid.

The list of valid scope names is given in Local Field References and covers metadata fields (and thus search facets) and local field references prefixed by field separated by a single period. Additional scopes may be added in future.

As a general rule, fields which belong to the ContentHub namespace (http://contenthub.haufe-lexware.com/haufe-document) are considered "well known" and have no special prefix. Fields which have been introduced by a content-producer (and are only used by that content-producer) are prefixed with the corresponding application id, separated by period (e.g., portals.category).

Please have a look in the Search fields which fields allow scoping with an exact value, a range-expression or a search-expression.

Range-Expression

Denoted either by

  • <=, >=, < or > as prefix in front of a word or
  • ... or .. as infix between two words

For instance:

  • sortDate:<=2014-12-31 will match all documents created or updated before or at the end of the year 2014, while
  • sortDate:2014...2015-12-31 will match all documents created or updated between the begin of the year 2014 and end of the year 2015.

Range expressions can only be used as expressions in a scope-expression. Using a range-expression outside of a scope-expression will be rejected.

Specifying Dates

Dates can be given in an absolute manner or a relative one. While an absolute date is a fixed point in time a relative date is always calculated by using the current time.

Absolute Dates

The syntax for absolute dates is

absoluteDate : year ( '-' month ( '-' day ( 'T' ( time | any ) )? )? )? ;

year         : Digit Digit Digit Digit ;
month        : Digit Digit ;
day          : Digit Digit ;
time         : hours ( ':' minutes ( ':' seconds ( '.' milliseconds )? )? )? timezone? ;
hours        : Digit Digit ;
minutes      : Digit Digit ;
seconds      : Digit Digit ;
milliseconds : Digit+ ;
timezone     : 'Z' | ( plusOrMinus? Digit+ ':' Digit+ ) ;
any          : '*' ;
plusOrMinus  : ( '+' | '-' ) ;

Examples:

  • 2014 the first millisecond in 2014
  • 2020-10 the first millisecond in October 2020
  • 2021-12-31T23:59:50Z 10s before new years eve 2022 in London

Relative Dates

Relative dates are caculated in relation to "now" (which is the current time). The syntax for relative dates is

relativeDate : dateOffset ( 'T' timeOffset )? ;
dateOffset   : plusOrMinus? ( yearOffset | monthOffset | weekOffset | dayOffset )+ ;
yearOffset   : Digit+ 'y' ;
monthOffset  : Digit+ 'm' ;
weekOffset   : Digit+ 'w' ;
dayOffset    : Digit+ 'd' ;
timeOffset   : hours ( ':' minutes ( ':' seconds )? )? ;
hours        : Digit Digit ;
minutes      : Digit Digit ;
seconds      : Digit Digit ;
plusOrMinus  : ( '+' | '-' ) ;

Examples:

  • -1d exactly 24 hours ago
  • +0dT06:30 6 hours and 30 minutes from now

Search all documents that have been ingested in the past 2 hours

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=ingestionDate:>-0dT02:00'
  --header 'Authorization: Bearer <token>'

Concrete examples (with curl)

Fulltext search

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding'
  --header 'Authorization: Bearer <token>'

Search with multiple terms

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding prozess plattform'
  --header 'Authorization: Bearer <token>'

Filtering for application (content source)

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding application:transformation'
  --header 'Authorization: Bearer <token>'

Filtering for documentType

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding documentType:News'
  --header 'Authorization: Bearer <token>'

Filter by one tag

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=tag:Organisationsentwicklung'
  --header 'Authorization: Bearer <token>'

Filter by multiple tags - enumeration

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=tag:(Organisationsentwicklung OR Digitalisierung OR "New Work")'
  --header 'Authorization: Bearer <token>'

Filter by multiple tags - wildcard

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=tag:*rganisation*'
  --header 'Authorization: Bearer <token>'

Expression Precedence Rules

While parsing queries the following expression precedence will be used (highest to lowest):

Level   Expression
6   Group
5   Phrase, Range
4   Scope
3   Not (Negation)
2   OR (Disjunction)
1   AND (Conjunction)
 

While most of the time expression precedence will follow intuition there are some possibly surprising scenarios, mostly around the combined use of AND and OR expressions.

E.g. income OR tax statement OR fine will be parsed equivalent to (income OR tax) (statement or fine). The surprise may be that OR expressions take precedence over AND expressions, contrary to e.g. Boolean Logic. However, this choice was made as it more closely aligns with the common use of natural language as e.g. reflected by left OR right handed.

Search fields

There are two sets of fields that can be used in the query expressions - fields defined by ContentHub that reference elements in the document that are defined in the ContentHub document schema and on the other hand there are fields that reference elements that are defined by the content providers. The custom fields from content providers are prefixed by the application name of their content.

ContentHub fields

  • title - The title of the documents. Note that title elements can occur multiple times with several types. This field will point to all elements which are read as title (such as ch:title or ch:title[name='compoundTitle']). This means searches/constraints will apply to the text of all such elements of a document. This is included in the default fields that will be searched in with a weight of 2.
  • baseline - A combination of the baselineContent and the baselineSearchableText elements of the documents. This is included in the default fields with a weight of 1.
  • native - Referencing the nativeSearchableText element.
  • appDocId - Referencing the appDocId element
  • application- Referencing the application element. This field can be used as search facet.
  • documentType - Referencing the documentType element. This field can be used as search facet.
  • tag - Referencing the tag elements and searching in all of them. This field can be used as search facet.
  • quickSearchPhrase - Referencing the quickSearchPhrase elements and searching in all of them.
  • packageId - Referencing the packageId element and searching in all of them. This field can be used as search facet.
  • publisher- Referencing the publisher elements and searching in all of them.
  • creator - Referencing the creator elements and searching in all of them.
  • ingestionDate - Date when this document was ingested into ContentHub.
  • sortDate - Referencing the chronologicalSortDate element.
  • revisionDate - Referencing the revisionDate element.
  • publicationDate - Referencing the publicationDate element.
  • visible - Referencing the visible element.
  • fingerprint - Referencing the fingerprint element.

Custom application specific fields

idesk

  • idesk.documentType - Referencing the idesk specific documentType. This can be used as a search facet.
  • idesk.documentCategory - Referencing the idesk specific documentCategory. Can be used as a search facet.
  • idesk.documentSubcategory - Referencing the idesk specific documentSubcategory. Can be used as a search facet.
  • idesk.documentClassification - Referencing the idesk specific documentClassification. Can be used as a search facet.
  • idesk.subjectAreaId - Referencing the idesk specific subjectAreaId.
  • idesk.isRoot - Referencing the idesk specific isRoot element. Can be used as a search facet.
  • idesk.quickSearchField
  • idesk.rootId - Referencing the idesk specific rootId.

portals

  • portals.category - Referencing the portals specific category element. Can be used as search facet.
  • portals.subcategory - Referencing the portals specific subcategory element. Can be used as search facet.
  • portals.visibleInSuite

academy

  • academy.subjectArea - Referencing the academy specific subjectArea element. Can be used as search facet.

haufeshop

  • haufeshop.mediaType - Referencing the portals specific category element. Can be used as search facet.
  • haufeshop.topProduct

hot

  • hot.category_title
  • hot.category_id
  • hot.parent_category_id
  • hot.sold_out

Local Field References

Local field references are used to bind names to locations within content hub documents and to specify the relative relevance weight associated with these locations, respectively. These names can then be used to specify scopes of full-text queries in Search Expressions.

Each occurrence of the query parameter field defines a single local field reference. (Note that a search request can include an arbitrary number of field query parameters!)

The syntax of a local field reference is as follows (in EBNF syntax)

local-field-reference = reference-id ':' field-spec { sep field-spec } ;<1>

reference-id = ref-start-char ref-char\* ;<2> <3>

ref-start-char = letter | '\_' ; ref-char = digit | '-' | ref-start-char
; field-spec = field-name { ',' qualification } ;<4>

field-name = ? database field name, in practice an NCName ? ;<5>

qualification = weight ; 

weight = 'weight:' decimal ;<6> <7>

decimal = \[ '+' | '-' \] ( '0' | ( non-zero-digit digit\* )) \[ '.'
digit\* \] ; sep = ( ' ' | '+' ) { sep } ; letter = ? any of the
characters 'A' - 'Z' and 'a' - 'z' ? ; digit = '0' | non-zero-digit ;
non-zero-digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
  1. If more than one field specification is given, than the scope defined by this local field reference is the union of all specified fields (with their respective weights taken into account for the computation of relevance scores in case of matches).

  2. The reference id is used to refer to this local field reference in a field query scope. If, say, the reference id is tags, then the search expression field.tags:term will search for occurrences of term in the fields specified by the local field reference with if tags.

  3. Reference identifiers must be unique within each search request; i.e., there must not be two field query parameters that define references with the same id.

  4. If no qualification is specified, then weight:1.0 is assumed.

  5. So far, the list of applicable field names is static and available from the content hub team. Once additional fields can be defined more or less on the fly, there will be an API to fetch this list.

  6. A weight equal to 0.0 means that matches in the respective field do not contribute to the relevance score.

  7. The current implementation cuts off values that fall outside the interval [-16, 64]. Values with an absolute value smaller than 1/16 = 0.0625 are rounded to 0.

Sorting Result Sets

By default, documents within search results will be ordered by decreasing relevance. Relevance will be scored by an algorithm considering things like search term position and frequency, document type and age and various other attributes into account and generally provides good results.

However, at some occasions, users might want to have tighter control about the result sorting and therefore an additional parameter sorting can be passed to a search request. Pass a field name here to sort the result set ascending by the values of this field. In order to sort descending prefix the field with a hyphen -.

All the fields from the ContentHub fields section can be used for sorting
e.g. title, documentType - in these cases the documents will be sorted alphabetically (by the specified field).

In order to sort for the date that content providers set as the date to sort by for their documents use the field sortDate. So in order to get the most recent documents on top use -sortDate.

Concrete example (with curl)

Search for onboarding and sort the results in alphabetical order by title.

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding&sorting=title'
  --header 'Authorization: Bearer <token>'

Search for onboarding and sort by date (most recent documents on top).

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=onboarding&sorting=-sortDate'
  --header 'Authorization: Bearer <token>'

Relevance Boosting

Sometimes it is desired to prefer certain kinds of documents over others (e.g. recent News over Textbooks), assuming both match the actual query. This process of rule based adjustment of document relevance is referred to as boosting.

To enable boosting, search requests may contain a list of boost parameters. Each boost parameter associates a number (the boost weight) combined with a search expression, stating: if a matching document also matches the boost's search expression it should gain the associated extra weight in relevance.

The syntax of a boost rule is as follows (in EBNF syntax)

boost = 'x' boost-weight ',' search-expression ;

boost-weight = ? double using '.' as comma in the range (0, 64] ? <1>
search-expression = ? a searchexpression ? <2>
  1. A boost-weight in the range of (0, 1) will lower the score of a document and is considered a down-boost. A boost-weight in the range of (1, 64] will raise the score of a document and is considered an up-boost.
  2. Syntax of search expressions follows the rules outlined in Queries and Search Expressions

Boosting Examples

{{host}}/content/v1/search?q=something&boost=x5,documentType:News
will assign an extra boost of factor 5 to any matching document that happens to contain the value News in its documentType metadata field

{{host}}/content/v1/search?q=dog&boost=x0.5,"shepherd"
will down-boost all documents containing the phrase "shepherd"

Search Term Snippeting

Finding the right search expressions tends to be an iterative process. During this process, users typically don’t want to read matching documents front-to-end but instead just need to get a quick glimpse at which parts of a document matched a certain search term and an extract of content surrounding that match. This information, document extracts with an indication which search term produced a match, is typically referred to as snippets.

The Search API provides a snippets parameter that allows to control the number, complexity and the location within documents of these snippets. Its syntax is as follows (in EBNF snytax)

snippets = count {',' qualification} ;

count = 'count:' ( '0'    <1>
                 | posint <2>
                 ) ;

qualification = numberOfTokens 
              | location ;

numberOfTokens = 'numberOfTokens:' posint ; <3>

location = 'location:' ( locationDef | '(' locationDef ( ',' locationDef )* ')' ); <4>

locationDef = locationName ( '(count:' count ')' )? ; <5>

posint = ? positive integer ? ;
locationName = ( 'title' | 'baselineContent' ) ;
  1. Defining a count of 0 disables snippet generation and thus can improve response times.

  2. The count given will be the maximum number of snippets returned in the search response. The actual number of snippets may be lower if not enough hits in the document were found. A snippet still may have one or many highlights.

  3. The numberOfTokens defines the maximum number of tokens (typically words) contained in each snippet. The tokens within a snippet may be less if the content has not enough tokens. The highlighted token is included in that count.

  4. With location all locations to generate snippets from can be defined. As visible in the locationName definition only title and baselineContent are supported and available. The order in which the locations are given does define the order the snippets appear in the search result.

  5. A locationDef must specify a locationName. Aditionally defining a count is possible. The count given in the locationDef defines the maximum number of snippets generated for this location.

By default, up to 3 snippets with maximum 10 tokens (words) for the locations title and baselineContent are generated. Each snippet may contain up to 2 highlights each.

Snippeting Examples

count:0
disables snippetting for this request (will improve response times)

count:5
Returns up to 5 snippets.

count:5,numberOfTokens:25
Returns up to 5 snippets with a maximum number of 25 words.

location:(title,baselineContent)
Returns matches only in title and baselineContent.

count:3,location:(title,baselineContent)
Returns up to 3 snippets, preferring matches from the document’s title, followed by matches from the document’s baselineContent section.

count:5,location:(title(count:1),baselineContent)
Returns up to 5 snippets, matches from the document’s title first, but limited to only one snippet, the rest of snippets will be created from the document’s baselineContent section.

A common question surrounding the use of constraints when formulating a search request is how to obtain a list of valid or reasonable constraint values. One way of obtaining such values is to facet listing.

For this, search requests may additionally contain instructions about which constraint values should be listed and how this listing should occur. More specifically, search requests support an arbitrary list of facet parameters (e.g. facet query parameters when using HTTP GET).

The content of each facet parameter has to follow EBNF syntax of

facet = constraint-name {',' qualification} ; <1>

qualification = sorting 
              | limit 
              | drill-in 
              | match-count ;

sorting = 'sort:' ( 'count' | '-count' | 'value' | '-value' ) ; <2>

limit = 'limit:' posint ;

drill-in = 'drill:' ( xname | '(' xname {','xname} ')' ) ; <3>

match-count = 'count:' ( '>=' posint <4>
                       | '<=' posint <5>
                       | posint '...' posint <6>
                       ) ;

posint = ? positive integer ? ;
xname = ? expression name ? ; <7>
  1. Indicates the constraint values should be listed for. If the name does not match any of the defined constraints the whole facet parameter will be ignored.

  2. By default, facet values are listed with decreasing relevance (equivalent to sort:-count).

  3. For further details on using facet drill-in please refer to Advanced Facet Browsing.

  4. Allows constraining facet values to those occurring in e.g. at least 5 documents

  5. Allows constraining facet values to those occurring in e.g. at most 10 documents

  6. Allows constraining facet values to those occurring in e.g. more than 5 but less than 10 documents

  7. Refers to a named expressions within the list of search expressions.

Facet Examples

documentType
Requests a listing of documentType values accepting a possibly defined default sorting and limit.

documentType,limit:20
Requests a listing of up to 20 documentType values (first 20 values according to default sorting.

documentType,sort:value
Requests a listing of documentType values, to be retrieved in increasing value order.

documentType,sort:-value
Requests a listing of documentType values, to be retrieved in decreasing value order.

documentType,limit:20,sort:-count
Requests a listing of the 20 most common documentType values (up to 20 values in order of decreasing relevance).

documentType,count:>=3
Requests a listing of documentType values, skipping the long tail of values that occur in only two or less documents.

Facet concrete example (with curl)

In order to find possible values for filtering you can explore the data using the faceting feature. Here for example we query a facet for tags and sort descending by the count of documents. That gives us the most frequently used tags in the set of documents we can access. In the result the buckets of the tag facet will hold the values of tags.

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?facet=tag,limit:50,sort:-count'
  --header 'Authorization: Bearer <token>'

Advanced Facet Browsing

facet=documentType
Requests a listing of documentType values accepting a possibly defined default sorting and limit.

facet=documentType,drill:documentType combined with q:documentType=documentType:ENTSCH
Returns hits only where the documentType is ENTSCH and the bucket has two count attributes:

  • count - the number of hits with drill i.e. with the documentType:ENTSCH constraint
  • countNonDrill - the number of hits without drill i.e.without the documentType:ENTSCH constraint. This can be useful when displaying the facets results, on a web page, when clicking on a particular facet (documentType), the result of the search with that documentType constraint is displayed on the right side of the page, but, by using the countNonDrill, the numbers will not change for the categories (the facets) on the left side of the page.

The drill feature can also be used to get the number of hits per category and subcategory.

Facet with drill concrete example (with curl)

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=steuer&q:docTypeString=documentType:BEITRAG&facet=packageId,limit:100,sort:-count,drill:docTypeString'
  --header 'Authorization: Bearer <token>'

The buckets from the response of a request like this will again have two count values:

  • countNonDrill - the number of documents per packageId i.e category
  • count the number of documents per packageId and documentType i.e. subcategory

Stemming maps a word to its common lemma (stem). For example, Kinder stems to the noun Kind and gearbeitet stems to the verb arbeiten.

An unstemmed search matches only the word form you’re searching for. For example, searching for Kind will not match a document containing Kinder. With stemming, the search matches the exact term, plus words with the same stem. Thus, a search for arbeiten will also match documents containing arbeitend or gearbeitet because they all share the stem arbeiten in German.

Aggregating Result Sets

Use the aggregation keyword to group search results and return the most relevant hits per group.

aggregation = group-by {',' match-count ',' discriminator} ; <1>

match-count = 'sample:' posint ; <2>
 
discriminator = 'discriminator:' QName ; <3>
  1. Aggregation is a group by clause where the most relevant hit per discriminator is selected.

  2. The grouped sample size, this is set to 1 at the moment (sample:1 is the only implemented value)

  3. The discriminator specifies the QName of the element by which the hits will be grouped. Valid values are:

    • {http://idesk.haufe-lexware.com/document-meta}rootId
    • {http://contenthub.haufe-lexware.com/haufe-document}documentType

Example:

group-by,sample:1,discriminator:{http://idesk.haufe-lexware.com/document-meta}rootId

Invisible Documents

Invisible documents are all documents with the metadata field visible set to "false". By default, ContentHub will exclude invisible documents, unless the used search expression explicitly contains a scoped expression for the field visible.

Examples:

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=Steuer'
  --header 'Authorization: Bearer <token>'

Searches for documents containing "Steuer". Only visible documents will be returned.

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=linklist visible:false'
  --header 'Authorization: Bearer <token>'

Searches explicitly for invisible documents containing "linklist".

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/search?q=Arbeit visible:(true OR false)'
  --header 'Authorization: Bearer <token>'

Searches for visible and invisible documents containing "Arbeit".

Retrieving Content

The Retrieval API allows consumers to retrieve full documents (including attachments) or specific parts of documents. The root context of the following RESTful paths is /content/v1.

Retrieving Full Documents

Issue GET on the /retrieval URI with the following parameters

  • query parameter contentHubId to identify the document to be retrieved

  • optional query parameter withBlobs to additionally retrieve any images or blobs embedded in the document. Values are true or false with default false.

Retrieving Specific Sections of a Document

Issue GET on the following URIs (query parameters are exactly as before unless specified)

  • /retrieval/meta retrieves the document meta data section

  • /retrieval/baseline retrieves the baseline content section. A document teaser can be additionally requested using the query parameter teaser. This parameter is of type int and specifies number of characters in the final teaser.

  • /retrieval/native retrieves the native content section

Retrieving Blobs

Blobs in this content refers to any kind of binary attachment to a main document. The parameter contentHubId must be specified with the following paths in order to identify the main document.

  • /retrieval/blobs retrieves all binary attachments in one multipart response

  • /retrieval/blobs/{blobId} retrieves the named binary attachment where blobId is a path variable and represents the id of the attachment.

  • /retrieval/blobs/{blobId}/meta retrieves the meta data of the named binary attachment

Examples

/content/v1/retrieval?contentHubId=contenthub://portals/content/215516&withBlobs=true : retrieve a full document and all its binary attachments.

/content/v1/retrieval/baseline?contentHubId=contenthub://portals/content/215516&teaser=100 : retrieve the baseline Content of a document and a 100 character teaser.

/content/v1/retrieval/blobs/2214.pdf?contentHubId=contenthub://portals/content/215516 : retrieve blob 2214.pdf from the given document

Concrete examples (with curl)

Full document retrieval

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/retrieval?contentHubId=contenthub://idesk/HI13538112'
  --header 'Authorization: Bearer <token>'

Baseline content retrieval

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/retrieval/baseline?contentHubId=contenthub://idesk/HI13538112'
  --header 'Authorization: Bearer <token>'

Baseline content with teaser retrieval

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/retrieval/baseline?contentHubId=contenthub://idesk/HI1856776&teaser=500'
  --header 'Authorization: Bearer <token>'`

Retrieving a Compound Document

This operation returns an atom collection of all contentHub documents of a compound document

Issue GET on the /retrieval/compound/{documentPart} URI with the following parameters

  • path parameter documentPart is the part that should be returned. This can be either:

    • full for the complete document,
    • meta for metadata only,
    • baseline for metadata plus baseline content or
    • native for metadata plus the native content.
  • query parameter contentHubId to identify the document to be retrieved

  • optional query parameter withBlobs to additionally retrieve any images or blobs embedded in the document. Values are true or false with default false.

  • optional query parameter constrainingQuery is an additional query can be defined that all returned document parts have to match. This can be used to filter the returned compound doc for those parts that should be visible to the user.

Concrete example (with curl)

curl --location --request GET 'https://api.contenthub.haufe.io/content/v1/retrieval/compound/full?contentHubId=contenthub://idesk/LI7635254' \
  --header 'Authorization: Bearer <token>'

Bulk Export

Creating Bulk Export Jobs

This operation registers a new job that will be processed in the background. It returns a job descriptor containing the properties of the job.

Issue POST on the /retrieval/bulk/job URI with a JSON object that may contain

  • a search descriptor specifying the documents to be included in the export

  • a retention period that defines how long the export will be kept available

  • whether to include the searchable text of the documents in the export

Getting All Export Jobs

This operation returns a list of all existing export jobs.

Issue GET on the /retrieval/bulk/job URI to get a JSON list of job descriptors.

Get Descriptor of Single Job

This operation returns the descriptor of a bulk export job to examine whether it has been finished already and can be fetched.

Issue GET on the /retrieval/bulk/job/{bulkExportJobId} URI with the bulkExportJobId variable referring to the UUID of the export job.

If the status property in the response object is set to FINISHED the export can be downloaded from the link given in the fetchUrl property.

Cancel an Export Job

This operation allows to schedule an export job for cancellation. The job will not be canceled asynchronously, usually after the operation has returned.

Issue DELETE on the /retrieval/bulk/job/{bulkExportJobId} URI with the bulkExportJobId variable referring to the UUID of the export job.

View Swagger definition »

Not logged in

You are currently not logged in, so we can't display your registered applications. Please log in first.

Log in »