Kalkati.net, XML database dump
The purpose of this document is to act as a technical guide to the Kalkati.net XML representation of Matka.fi timetable data, also known as the timetable dump file. The dump file represents the line and timetable data of the Matka.fi timetable database exported into a single file. The dump file facilitates the interchange of timetable data between Matka.fi and external parties.
Kalkati.net is the name of the XML format used in the dump file. The purpose of the Kalkati.net timetable data transfer format is to unify timetable data presentation between different systems. In this document is described how the Kalkati.net XML format is applied to Matka.fi timetable data. The main goal of this documentation is to provide a developer with the essential knowledge of how to utilize the dump file data in building new applications. It should be noted that this guide does not try to act as complete specification or documentation of the dump file but rather a technical guide to it.
2 Getting started
- Dump file – api.matka.fi/data/all.zip
- Dump file's validity flag – api.matka.fi/data/flag.txt
- Kalkati.net XML schema (for timetable data) – api.matka.fi/data/kalkatischema.zip
- Kalkati.net standards – www.kalkati.net
- Kalkati to GTFS converter – github.com/HSLdevcom/kalkati2gtfs
- Matka.fi – www.matka.fi
2.2 Dump file
The dump file represents the line, route, and timetable data of the Matka.fi timetable database exported into a single file. As mentioned in the introduction, the role of the dump file is to enable easy interchange of data between Matka.fi and external transport providers and to enable third party developers to utilize the timetable data.
In technical level, the dump file is basically a zipped XML file which contains a variety of different data artifacts. The XML of the dump file is based on the standard Kalkati.net schema. The dump file is generated from the database data once every week. The generation date and validity period of the dump file is visible in the flag.txt file (located in the same directory). The dump file is downloadable from the Matka.fi server by authorized users. The URL of the file is listed in chapter “2.1 - Links”. Downloading the file requires the use of a password. The password may be obtained from Liikennevirasto.
Matka.fi (or Journey.fi in English) is an online travel guide for public transport in Finland. Matka.fi allows users to search for the best public transport connections between two selected locations. The service includes all buses, trams, metro, commuter trains, and ferries. Matka.fi suggests the most suitable connections available at the given time.
Kalkati.net is a transport and cargo standards project initiated by the Finnish Ministry of Transport and Communications. It is basically a set of standards aiming to standardize the communication between Finnish public transport operators. Kalkati.net standards benefit also cargo companies. The main goal of the standards is to allow companies to exchange data more easily. This is expected to lead in a better collaboration in the field of Finnish transport operators and companies. The standards include a set of process diagrams specifying the communication patterns of how to exchange data, and a set of XML Schemas specifying the structure of the exchanged XML documents.
2.5 XML schema
XML Schema is a standard by W3C that is used to specify data models of XML documents. In a schema document is specified the elements and data types that may be used in the instance documents (in documents that are based on the schema).
Below is listed some links for a developer to get started with XML Schema. The links include the official XML Schema specification, references and tutorials.
2.6 Useful tools
XML documents, especially XML Schemas, are often difficult to read as plain text files. To get a more convenient view of a XML/Schema document, one may use an XML visualization tool. Below is listed some (freeware) tools for this purpose.
- Eclipse + XML plug-ins – www.eclipse.org
- Netbeans (full version) – www.netbeans.org
- Microsoft XML notepad (Windows only) – www.microsoft.com
3 Kalkati.net XML File Structure
This chapter serves as a guide to the Kalkati.net XML Schema. In this chapter we explain the most important elements of the schema and their attributes. This chapter applies also to the structure of the Matka.fi dump file. This is because the dump file is an instance document of the Kalkati.net schema. Thus, this chapter is a guide both to the Kalkati.net schema and the dump file’s structure.
We do not intend to give a thorough explanation of all elements and attributes of Kalkati.net XML here but to provide a short overview of them. The exact details about the use of elements and their attributes can be always found in the Kalkati.net XML Schema document. Get the latest dump file and schema from the URLs specified in chapter “2.1 - Links“.
“Occurrences” means how many times an element has to (or is allowed to) occur in the document. The possible values for occurrence are 0..1 (at most once), 1..1 (exactly once), 0..* (any number of times including zero), 1..* (at least once). In the “attributes” -section of element definition, the optional attributes are marked with “optional” text. If there is no “optional” then the attribute is always used. In the “notes” sections some special instructions may be given about the dump file’s interpretation or how to deal with some specific issues of Kalkati.net XML in general.
Figure 1 presents the structure of the dump file visualized in Netbeans XML Tool. In Appendix B presents the same but visualized in XMLSpy tool.
Figure 1: Structure of the dump file
<!-- elements here -->
Description: Is the root element of the XML document.
- version (optional) defines the version number of the jpdb schema the document conforms to.
Description: Gives the period of time over which the following timetables are valid and identifies the providing company. Unless otherwise noted, the first date in a footnote vector is the date of the delivery period’s start.
- Firstday defines the starting date and time of the delivery period.
- Lastday defines the ending date and time of the delivery period.
- CompanyId defines the provider company of the timetable data (by referencing <Company>).
- Description (optional) describes the delivery period of the timetables.
- Version (optional) defines the version string assigned by the provider company of these time tables.
<Company CompanyId=’1’ Name=’YTV’ Time=’0000’ Code=’YTV’/>
Description: Defines a timetable provider or transport operator that provides transport services..
- CompanyId is the identification number of the company (for referencing from other elements).
- Code defines the transport provider assigned code for the company.
- Name (optional) defines the company name.
- Time (optional) defines the time of day when the date changes for this company. This is purelyinformational and has no effect on how the timetables are interpreted.
Notes: Time should never be used in routing calculations – it's role is purely informational. The companies in the <Company> elements involve not only the main transport operators but also the subcontractor operators. Name is always the original Finnish name of the company. It is possible that also <Company> elements have synonyms (translations).
<Country CountryId=’fi’ Name=’Suomi’ Inland='1'/>
<Country CountryId=’se’ Name=’Ruotsi’/>
<Country CountryId=’ru’ Name=’Venäjä’/
<Country CountryId=’en’ Name=’Englanti’/>
Description: Defines a country that is involved in the in the transport service.
- CountryId is the ID of the country (for referencing from other elements).
- Name (optional) is the name of the country. Inland (optional, boolean) attribute defines whether the country is held as the “homeland” (inland) of the transport service.
Notes: In the dump file, <Country> is always Finland. CountryId values are always in lower case. Name values are always in the default language (Finnish).
<Period Firstday=’1970-01-01T00:00:00.0+00:00’ Lastday=’2100-01-
Description: Defines a time zone that is in use in the transport service area. The time zone details are defined in the <Period> child elements.
Note: In Matka.fi data timezone is not used to handle winter/summer time. Therefore Matka.fi data is always in +2 timezone.
- TimezoneId is the ID of the time zone (for referencing from other elements).
Description: Defines the offset from the mean time (UTC+DST) during a period of time.
- Difference defines the time difference to UTC+DST. DST is the standard daylight saving time (used in the EU).
- Firstday defines the start date and time (in UTC) of the period.
- Lastday defines the end date and time (in UTC) of the period
Notes: In the dump file the time zone is always UTC+DST+02:00. Currently DST is +1 hours between the last Sunday of March 01:00 (UTC) and the last Sunday of November 01:00 (UTC). Note that the periods should never overlap!
<Language LanguageId=’fi’ Description=’Suomi’ isDefault=’true’/>
<Language LanguageId=’sv’ Description=’Ruotsi’/>
<Language LanguageId=’en’ Description=’Englanti’/>
<Language LanguageId=’ru’ Description=’Venäjä’/>
Description: Defines a language used in the service.
- LanguageId is the ID of the languages (for referencing from other elements).
- Desription (optional) is the name/description of the language.
- isDefault (optional, boolean) defines if the language is the default language of the service. If isDefault is true then all Name and Description attributes are assumed to be in this language. Only one language element may set isDefault to true.
Notes: <Language> elements are analogous to <Country> elements. LanguageId is normally two letters but may be also three. LanguageId is always written in lower case. Note Swedish language is “sv”, not “se”!
<Station StationId='2241206' Name='Mankkaankallio' Minchangetime='0' TimezoneId='1' CountryId='fi' city_id='2' X='2542631.0' ='6676314.0' type='0'/>
Description: Defines a station or stop in the service area.
- StationId is the ID of the station. In Matka.fi data this is a unique 7-digit numeric code.
- Name (optional) defines the station name.
- CountryId defines the country the station is located in (by referencing <Country>).
- TimezoneId defines the time zone the station is in (by referencing <Timezone>).
- Minchangetime and MaxchangeTime (optional) attributes define the default minimum/maximum time n mins) used when changing vehicles at this station (may be overridden in <Change>).
- city_id (optional) defines the identification number of the city or county the station is located in.
- X and Y attributes (optional) define the coordinates of the station in KKJ3 coordinate system.
- GlobalId (optional) defines a digistop ID for the station (see www.digistop.net).
- IsVirtual (optional, boolean) defines whether the station represents a real physical station or not. For example, a virtual station “Helsinki” is not any real station but represents a set of stops around therailway station.
- Type (optional) is an alias of the isVirtual attribute - if 1 then the station is virtual, otherwise not.
- stop_area (optional) is used to group stations. Stations with the same stop_area value belong to the same group.
<Trnsattr TrnsattrId=’1001’ Name=’Makuupaikka’/>
<Trnsattr TrnsattrId=’2001’ Name=’Makuupaikka, 1. luokka’ AttrType=’1001’/>
<Trnsattr TrnsattrId=’2002’ Name=’Makuupaikka, 2. luokka’ AttrType=’1001’/>
Description: Defines a transportation attribute. These elements are used to provide additional (free-form) information about services (<Service> elements).
- TrnsattrId is the ID of the attribute (for referencing from other elements).
- Name (optional) is the name of the attribute.
- AttrType (optional) attributes are used for grouping of similar attributes (see the example above). The attribute values may be in a local or global scope (namespace). If the attribute value (or code) is defined in a global scope it means different timetable providers can use same codes.
- The purpose of the Processcode attribute (optional) is not known at the time of writing.
Notes: Service elements refer the <Transattr> elements through the <ServiceAttribute> elements. TrnsattrIdattributes
are always intergers. Name is always in Finnish. The Attrtype global codes may be found in a
database or code specifications (such as in Trident Project).
<Trnsmode TrnsmodeId=’6’ Name=’Metroliikenne’/>
Description: Defines a single transport mode (vehicle type) used in the transport service area.
- TrnsmodeId is the ID identifier of the transport mode enabling references from other elements. It may be an integer or a common short code for the transport mode.
- Name (optional) is the textual representation of the transport mode.
- Modetype (optional) attributes are used to refer to the universal transport mode codes, the “Trident project” codes. These same codes have been used with many transport databases in Finland. See Appendix A for a list of these codes.
Notes: Modetype-attributes are always integers. Name is always in Finnish.
<Station StationId='6020215' Name='Masaby bibliotek'/>
<Trnsmode TrnsmodeId='1' Name='Buss'/>
<!-- more.. -->
Description: <Synonym> elements are used for providing translations of texts in other elements. A <Synonym> element is actually only a container element, the translations are given as child elements for it. The child elements are always of form <Element id=’key’ name=’translated text’/>. The Element may be one of the following: Language, Company, Country, Station, Trnsattr, or Trnsmode. The id attribute refers to the ID of the element that has been translated. The name attribute, which is the actual translation, overrides the value in the original element.
- LanguageId defines the language the synonym is in. It references to <Language> element.
Notes: In the case of <Language> translation attribute Description is used instead of the Name attribute. This is because the <Language> element has no name attribute.
Description: DEPRECATED - Enables the definition of new services by combining the existing services (<Service> elements).
Notes: The element is deprecated and is not in use in the dump file. <Change> elements are used now instead of <Thrusrvc> because they are more easily interpreted.
<Change ServiceId1=’123’ ServiceId2=’345’ userVisible=’true’/>
<Change ServiceId1=’432’ ServiceId2=’455’/>
Description: Defines a possible change from the stop of one service to that of another. It merges two services into one so that the last stop of the first service is linked to the first stop of the second service. With <Change>, the changes are always guaranteed.
- ServiceId1 and ServiceId2-attributes represents the identifiers of the services to be merged into one.
- StopIx1 (optional) is the order number of the stop in the first service from which the passenger changes.
- StopIx2 (optional) is the order number of the stop in the second service to which the passenger changes.
- ChangeTime (optional) defines the time in minutes it takes to make the change.
- ChangeCost (optional) is a positive weight factor that can be used to prefer or discourage this change.
- UserVisible (optional, boolean) defines whether the change is visible/invisible to the user.
- Guaranteed (optional, boolean) defines if the change is marked as being guaranteed.
Notes: The passenger should be notified that the second service waits for the first service in case the service is guaranteed in guaranteed attribute.
<!-- service contents -->
<!-- more services -->
Description: The role of the <Timetbls> element is to act as a container element for the timetables in <Service> elements (see below).
<ServiceNbr CompanyId=’3668’ ServiceNbr=’1024 1’ Variant=’24’ Name=’Erottaja - Seurasaari’/>
<ServiceAttribute AttributeId=’743’ FootnoteId=’27’/>
<Stop Ix=’1’ StationId=’1030130’ Arrival=’0550’/>
<Stop Ix=’2’ StationId=’1020174’ Arrival=’0551’/>
<Stop Ix=’3’ StationId=’1040128’ Arrival=’0552’/>
<Stop . . . />
Description: Defines information about a single departure (= a service in Kalkati.net). The <Service> elements of <Timetbls> represent the actual (all) timetable data in the system. <Service> elements appear only as child elements of <Timetbls>.
- ServiceId is the identification number of the service (for referencing from other elements). Note that the serviceId is only an internal identification number (used in the database), it is neither a line number nor a line code!
- <ServiceNbr> (1..*) - Defines the line number and other identifiers for the Service. Also defines the company that provides this transport service. Every leg of transport must be covered by exactly one <ServiceNbr> element.
- ServiceNbr defines the code that the transport provider uses to identify the service.
- CompanyId-attribute defines the transport company that operates the service (by referencing <Company>).
- Variant (optional) defines the code that the passenger uses to identify the service – or the “linenumber”.
- Name (optional) defines a desription of the service route in a human-understandable form (e.g. “Otaniemi – Tapiola – Soukka“).
- FirstStop and LastStop (optional) are used to specify subintervals of stations as what comes to the validity of the <ServiceNbr> element.
- <ServiceValidity> (1..*) - Provides information about the validity of the service, or in other words, information about the days when the service is being operated.
- FootnoteId defines the validity period (by referencing <Footnote>). The validity information is always concerned with all the stations in the line, individual stations may not have their own validity dates.
- FirstStop and LastStop (optional) are used to specify subintervals of stations as what comes to the validity of <ServiceValidity> element.
- <ServiceTrnsMode> (1..*) - Provides information about the means of transport (bus/tram/train etc.) of the service.
- TrnsmodeId defines the means of transport (by referencing <Trnsmode>).
- FirstStop and LastStop (optional) are used to specify subintervals of stations as what comes to the validity of the <ServiceTrnsMode> element.
- <ServiceAttribute> (0..*) - Provides additional free-form information (attributes) about the service.
- AttributeId defines the attribute (by referencing <Trnsattr> in which the attribute content is defined).
- FootnoteId-attribute defines the validity period of the attribute (by referencing <Footnote>).
- FirstStop and LastStop (optional) are used to specify subintervals of stations as what comes to the validity of the <ServiceTrnsMode> element.
- <Stop> (1..*) - Declares a stop of this service. Also provides information about the arrival and departure times.
- Ix is the order number of this stop, starting from 1.
- StationId defines the station (with a reference to <Station>).
- Departure (optional) defines the service's departure time (in local time) from the stop.
- Arrival (optional) defines the arrival time (in local time) to the stop. If arrival is not set, it is expected to be the same as departure time.
- Type (optional) defines the type of the stop (s, f, i, c, or p), see schema for details.
Notes: In the dump file, the FootnodeId of <ServiceAttribute> always references the same <Footnote> as the FootnodeId in <serviceValidity>.
<Footnote FootnoteId='27' Vector='11111101111110111110011111101111110111111011111' Firstdate='2008-06-02'/>
Description: Defines a validity period. Footnotes are used by <ServiceValidity> and <ServiceAttribute> elements in <Service>. Validity periods are specified as a set days where the timetable is either valid or not.
- FootnoteId is the identification number of the footnote (for referencing from other elements).
- Vector specifies the validity period as a bit vector string of 0s and 1s. The bits in the vector represent subsequent days. A “1” in the vector indicates that the timetable is valid on that day, a “0” that it is not.
- Firstdate (optional) specifies the start date of the vector, i.e., the date of the first bit in the vector. If there is no value for this attribute the start date of the vector is the Firstdate of the Delivery element.
4 Questions and answers
4.1 Keys and references
Q: Can you explain the XML Schema keys and references?
A: Keys are used as the unique IDs of elements. For example, serviceId attribute of the <Service> element is a unique ID (a key) of that element. A reference is a relation between elements A and B. The key (or ID) of element B is specified in the reference attribute of element A. For example, element Stop references element Station through StationId attribute. Now element A is said to be in a relation with element B and is linked with its data. References in XML Schema are analogous to relations in relational databases.
4.2 Local/Global keys
Q: What is meant by local and global keys?
A: The keys in Kalkati.net XML are divided into three categories based on their scope (namespace): local keys, global keys, and provider specific keys. The key scopes are defined in the <key> definitions section in the schema. A single key's scope can be seen from the <key> name attribute. The <key> names follow a pattern of [ElementType][Scope]Key. For example, a key CompanyLocalKey defines CompanyId key is defined in a local scope.
Local scope means a key is unique only in that current data file. For example, ServiceLocalKey means the ServiceId's are only valid in that document it has been defined in. Thus, the ID's have no meaning in any other contexts.
Global scope defines a key is a commonly recognized code that may have been defined in some universal code specification. For example, the keys of <Country> elements come from the ISO specification for country codes.
Provider (specific) scope means a key is unique to the timetable data provider only. For example, StationProviderKey means the StationId is unique ID for a station only within the data of that timetable data provider.
4.3 Departure notes
Q: Where are the departure specific notes (e.g. low-floor bus) specified in the XML data? Where are the short codes (e.g. M for low-floor bus) specified.
A: The special notes are known as transport attributes in Kalkati.net XML. They are specified in the <TrnsAttr> elements. A transport attribute is linked to a departure (service) via the attributeID attribute of the <ServiceAttribute> element (a child element of <Service>).
Unfortunately, in Kalkati.net XML there is no such specific attribute where the short codes could be specified. However, it is possible to overcome this limitation by using the short codes as the TrnsAttrId attributes.
4.4 Special days
Q: How are the departures of special days presented in data?
A: There is no concept of a special day in Kalkati.net XML. All days are basically equal to each other. In the data, if a service is not being operated on a special day such as Christmas day, it is indicated in the corresponding <Footnote> bit vector string of the service (see <Footnote> for details). Instead, if there is a service that is being operated only on a special day and with a special timetable, such a service is specified in an extra <Service> element of its own.
4.5 Subtitute timetables
Q: Is it possible to have substitute timetables for the timetables of weekdays that are actually holidays? For example, to use Sunday timetables on Easter days?
A: Not really. In Kalkati.net XML there is no such a concept as a substitute timetable. Special timetables have to be always specified in their own <Service> elements. It is possible, however, to define a single timetable for a line that is valid throughout the year on all holidays. This is done using a <Footnote> that specifies the timetable is valid only on the holiday days. The bit vector string of the <Footnote> would look something like: 10000000000000000000000100000000000000000010.. (= with a lot of zeros for regular days).
4.6 Element Order in the XML File
Q: Are the elements always in same order in Kalkati.net XML file?
A: Yes. This is because if an element in Kalkati.net XML element groups always appear in sequences. The sequence indicator in XML Schema means that the elements must always appear in the same order they are declared in a sequence. In general, parsing of sequenced elements is often much faster than parsing of unordered elements.
It is notable that this applies also the <Synonym> element. The different elements must be grouped so that that the groupings appear in the same order as in the rest of the file. However, within the groupings individual elements can be in any order.
4.7 Over-midnight Departures
Q: How are the over-midnight departures presented in Kalkati.net XML?
A: In Kalkati.net XML the departure times in <Stop> elements are specified using a 32 hour clock. This means there may be buses departing from a stop, e.g., at 24:15 hours (= 0:15 am). In terms of Kalkati.net XML, thus, there may not be departures (services) that run through the next day. Below is an example of the representation of an over-midnight departure (service).
<ServiceNbr ServiceNbr="540" Name="Espoo - Ikean liittymä – Helsinki- Vantaan lentoasema" CompanyId="2" />
<ServiceValidity FootnoteId="4" />
<ServiceTrnsmode TrnsmodeId="1" />
<ServiceAttribute FootnoteId="4" AttributeId="DT.s"/>
<ServiceAttribute FootnoteId="4" AttributeId="NT.27"/>
<Stop Ix="1" StationId="2987743" Departure="2350"/>
<Stop Ix="2" StationId="2345553" Departure="2352"/>
<Stop Ix="3" StationId="2837456" Departure="2400"/>
<Stop Ix="4" StationId="2223048" Departure="2403"/>
<Stop Ix="5" StationId="2304894" Departure="2405"/>
<Stop Ix="6" StationId="2008737" Departure="2408"/>
<Stop Ix="7" StationId="1439457" Departure="2410"/>
<Stop Ix="8" StationId="1483633" Departure="2413"/>
<Stop Ix="9" StationId="1098744" Departure="2417"/>
<Stop Ix="10" StationId="3943774" Departure="2421"/>
<Stop Ix="11" StationId="3303008" Departure="2424"/>
<Stop Ix="12" StationId="3098573" Departure="2425"/>
In the following table is presented the “Trident project codes” for different vehicle types (transport modes).
|21||long/mid distance train|