Advanced search

Technical background

opendata.swiss has a very powerful search engine, that can help you to find exactly the datasets you want. The search is provided by the open source component Apache Lucene/Solr. Every dataset is indexed by Solr when it gets updated, and if you perform a search on the portal, this index is queried to efficiently deliver results.

The search index is basically the "database" where all the information for the search is saved. It uses a custom schema with all the dataset fields that should be indexed. The schema is flat, i.e. nested elements like resources must be saved differently, in order for Solr to index them. The same applies to the multilingual fields, which are all stored with the language suffix, e.g. keywords_en contains the English keywords.

By default, all the fields that belong to a dataset are copied in one field (called "text"), so that the search process only has to check one field to find a match. So if a user submits a search with the query "weather", Solr runs this query against the "text" field of all datasets.

Search Index

The search index contains the following fields:

URLs: url ckan_url download_url res_url

Text-fields: extras_* res_extras_* urls name title title_string text license notes tags groups organization res_name res_format res_description identifier see_alsos maintainer author publishers * contact_points

Translated fields: title keywords groups organization res_name res_description

Find more detailed information about the Solr configuration in the official Solr documentation. The config and schema of opendata.swiss is available on GitHub: solrconfig.xml schema.xml

The source of the referenced files in the solr.xml (e.g. italian_stop.txt, fr_elision.txt, etc.) can be found in the official CKAN-Repository of the current CKAN-Version on Github. All other files (e.g. stopwords.txt) are provided by Solr.

Query syntax

Solr has its own query syntax to write complex queries. Depending on the query, Solr uses a different query parser to determine what to do.

Search operators

Use +{field}:{value} to include a search term, e.g. +title_en:power to find all datasets, whose English title contains the word "power"
Use -{field}:{value} to exclude a search term, e.g. +title_en:power -title_en:hydraulic to find all datasets, whose English title contains the word "power", but not "hydraulic"
Use AND to combine several search terms that all must match, e.g. keywords_en:(geology AND geophysics) tp find all datasets that have both tags geology and geophysics
Use OR to combine several search terms, where one of them must match, e.g. organization:(kanton-thurgau OR stadt-zurich)

All of these options can be further combined together, e.g. organization:(kanton-thurgau OR stadt-zurich) karte

Searchterm suggestions

The search-field of opendata.swiss provides searchterm-suggestions when a user types into it. For each language a self-contained Solr index is built multiple times throughout the day. That means that changes to datasets or new data won't be reflected in the suggestions immediately.

The index is based on the following fields:

dataset-title (translated)
keywords (translated)
groups (translated)
organization (translated)
distribution-name (translated)
author
maintainer
contact_points
publishers
identifier
distribution-format