Parent/Child

Using document references, documents can have parent/child relationships. Use this to join data by importing fields from parent documents. Features:

  • simplify document operations - one write to update one value
  • no de-normalization needed - simplifies data updates and atomic update into all children
  • search parent documents only, as well as all documents
An alternative to parent documents is using arrays of struct fields - this guide covers both.

Common use cases are applications with structured data like commerce (e.g. products with multiple sellers), advertising (advertisers with campaigns with ads, that have budgets that need realtime updates).

High-level overview of documents, imported fields and array fields:

Parent documents

Model parent-child relationships by using references to global documents. This is like foreign keys in a relational database. Parents can have parents. A document can have references to multiple parents - the parents can be of same or different types.

Using a reference, fields can be imported from parent types into the child's search definition and used for matching, ranking, grouping and sorting. A reference is a string attribute with the parent's document ID as value. References are hence weak:

  • no cascade delete
  • a referenced document can be non-existent - imported fields do not have values in this case

When using parent-child relationships, data does not have to be denormalized as fields from parents are imported into children. Use this to update parent fields to limit number of updates if a field's value is shared beween many documents. This also limits the resources (memory / disk) required to store and handle documents on content nodes.

At cluster changes, global documents are merged to new nodes before regular documents. For consistency, a content node is not serving queries before all global documents are synchronized - refer to elastic Vespa for details.

References and imported fields are not supported in streaming mode.

Performance notes:

  • As parent documents are global, a PUT or UPDATE will execute on all content nodes. Node capacity will limit the number of such documents - there should normally be an order of magnitude fewer parent documents than child documents
  • Memory usage grows accordingly. A global document is otherwise equal to a regular document, but each content node must be sized to hold all global documents plus its share of regular documents
  • Reference fields add a memory indirection and does not impact query performance much
  • Search performance notes

Multi-value fields

A document can have fields that are arrays or maps of struct. Structs and documents are similar - a set of field name/value pairs. One-to-many mappings can hence be implemented this way, as an alternative to using parent/child.

sameElement() is a useful query operator to restrict matches to same struct element.

Parent or multi-value?

As a rule of thumb, model the items searched for as the document - example products for sale. Shared properties, like vendor, can be model using a parent document, importing a vendor name field - assuming a vendor has many products, and the vendor list is limited. Use arrays of structs for properties documents might have, like shoesize or screen resolution - one can then have a struct field for property name and another for property value, giving a flexible structure for products with an unlimited set of possible properties.

Example

search advertiser {
  document advertiser {
    field name type string {
      indexing : attribute
    }
  }
}
[
  { "put": "id:test:advertiser::cool", "fields": { "name": "cool" } }
]
search campaign {
  document campaign {
    field advertiser_ref type reference<advertiser> {
      indexing: attribute
    }
    field budget type int {
      indexing : attribute
    }
  }
  import field advertiser_ref.name as advertiser_name {}
}
[
  { "put": "id:test:campaign::thebest", "fields": {
      "advertiser_ref": "id:test:advertiser::cool",
      "budget": 20 }
  },
  { "put": "id:test:campaign::nextbest", "fields": {
      "advertiser_ref": "id:test:advertiser::cool",
      "budget": 10 }
  }
]
search salesperson {
  document salesperson {
    field name type string {
      indexing: attribute
    }
  }
}
[
  { "put": "id:test:salesperson::johndoe", "fields": { "name": "John Doe" } }
]
search ad {
  document ad {
    field campaign_ref type reference<campaign> {
      indexing: attribute
    }
    field other_campaign_ref type reference<campaign> {
      indexing: attribute
    }
    field salesperson_ref type reference<salesperson> {
      indexing: attribute
    }
  }

  import field campaign_ref.budget as budget {}
  import field salesperson_ref.name as salesperson_name {}
  import field campaign_ref.advertiser_name as advertiser_name {}

  document-summary my_summary {
    summary budget type int {}
    summary salesperson_name type string {}
    summary advertiser_name type string {}
  }
}
[
  { "put": "id:test:ad::1", "fields": {
      "campaign_ref": "id:test:campaign::thebest",
      "other_campaign_ref": "id:test:campaign::nextbest",
      "salesperson_ref": "id:test:salesperson::johndoe" }
  }
]

Document type ad has two references to campaign (via campaign_ref and other_campaign_ref) and one reference to salesperson (via salesperson_ref). The budget field from campaign is imported into the ad search definition (via campaign_ref) and given the name budget. Similarly, the name of salesperson is imported as salesperson_name. Document type campaign has a reference to advertiser and imports the field name as advertiser_name. This is also imported into ad via campaign_ref from its grandparent advertiser. To use the imported fields in summary, define a document summary my_summary containing these fields.