Page 1 of 1

xlocate 1: Classification of geocode result

Posted: Fri Jun 06, 2014 3:32 pm
by Joost
In many cases systems receive addresses from other systems that need to be used in transport planning. For example the transport management system of a transport company can receive data through an EDI from an order management system of a client. Often this data is lacking coordinates needed for further planning and needs to be geocoded.

When geocoding there is always the question of how to determine if the result quality is high enough to accept it and to use it in further calculations without the need of having the result checked by a user. These data sets can become quite big and the difference of only a few percent in the automatic geocode acceptance rate can cause many man hours of manual checking. So how can you use the xLocate server to distinguish between a bad result and a good result?

The first way is to make use of the score. The xLocate will rank the result between 100 (perfect match) and 0 (no match at all). You can find the score on the totalScore attribute on the ResultAddress object. This gives you a straight forward way to set a limit on what you accept automatically and what to send to manual geocoding. However there is a downside: The score calculation is a formula you cannot change. It can be that within your business case the formula doesn't suit your needs.

For example in a quote system for international transport you might not care if a street can be found correctly. As long as the place or postcode are good matches you know that the reality will only slightly differ from your quote. If you are working within parcel delivery a street can become important again because a wrong match can send the vehicle to the wrong side of the city. So there can be a need for a more detailed quality indicator than a single number.

This can be done by working with the field classifications. For many input fields you can request a classification so you can make your own decision tree. These classifications are not returned by default. To request the classifications you need to add them as ResultField in your input. The result field that can help you are:
  • POSTCODE_CLASSIFICATION
    TOWN_CLASSIFICATION
    STREET_CLASSIFICATION
    HOUSENR_CLASSIFICATION
Note 1: Country and state and not classified since the xLocate only allows exact matches on this level
Note 2: city and city2 are combined in the TOWN_CLASSIFICATION. It is a well-known fact that opinions on whether something is a city or a city2 can differ a lot. Instead of forcing the user to input the data exactly as the map provider has stored it, our geocoding algorithm can work around this. For example: the village of Heffen in Belgium is a district of the city of Mechelen according to the map provider. XLocate will allow you to enter Heffen as a city and will return Mechelen, Heffen as result without adding a penalty for Heffen being in the city input field instead of the city2 input field.

The possible output values can be looked up in the FieldClassificationDescription enumeration.

Examples of general decision tree for transport can be:

If
(POSTCODE_CLASSIFICATION = EXACT and TOWN_CLASSIFICATION >= Medium and STREET_CLASSIFICATION >= High)
Or
(TOWN_CLASSIFICATION >= High and STREET_CLASSIFICATION >= High)
Then
accept the result
Else
send to manual geocoding

In this sample we take into account that a typo in a postcode can easily lead to another valid postcode while a typo in a place name does not. It always wants streets classified high to make sure a vehicle will end up near the real result. It does not look at the house number result because being in the correct street is close enough.

Re: Classification of geocode result

Posted: Tue Jan 10, 2017 4:09 pm
by Bernd Welter
Hello together,

thanks Joost - brilliant !!! I just spoke to a player who is very interested in this. Let me add some info:

The result elements of a xLocate.findAddress() output can be categorized as follows:
  • Properties describing the output address (country, state, postcode, city, city district, street, housenumber)
  • The coordinates (x,y)
  • Quality criteria comparing the result address with the input address (e.g. ResultField.FOUNDBY_STREET, totalScore)
  • Quality criteria comparing the result address with other result addresses (e.g. classification=UNIQUE versus HIGH)
  • Quality criteria describing the hit itself (e.g. DetailLevel==HOUSENRINTERPOLATED, LEVEL=8: 5000-10000 population)
I attached two documents (old but important!) with valuable content:
gpGeocoder_Parameter.pdf
Documentation Geocoder_Parameter
(134.64 KiB) Downloaded 712 times
gpGeocoder_Classification.pdf
Documentation Geocoder_Classification
(346.9 KiB) Downloaded 709 times
So we offer a lot of information that can be used for a client-side decision. Feel free to give us your feedback!

Best regards Bernd

PS: Here is a sample ...

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<soap:Body>
		<findAddress xmlns="http://types.xlocate.xserver.ptvag.com">
			<Address_1 city="dresden" city2="" country="D" houseNumber="8" postCode="01326" state="" street="helfenberger grund" />
			<ArrayOfSearchOptionBase_2>
				<SearchOptionBase xsi:type="SearchOption" value="1" param="SEARCH_BINARY" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="1" param="SEARCH_PHONETIC" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="0" param="SEARCH_FUZZY" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="1" param="STREET_RETURNALLHNR" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="1" param="CITY_RETURNALLCITY2" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="0" param="MULTIWORDINDEX_ENABLE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="1" param="POSTCODE_AGGREGATE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="0" param="INTERSECTIONS_ENABLE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="0" param="ASTERISKMODE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="3" param="COUNTRY_CODETYPE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="0" param="STREET_HNRPOSITION" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="1" param="SWAPANDSPLITMODE" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="" param="HNR_OFFSET" xmlns="http://xlocate.xserver.ptvag.com" />
				<SearchOptionBase xsi:type="SearchOption" value="DEF" param="RESULT_LANGUAGE" xmlns="http://xlocate.xserver.ptvag.com" />
			</ArrayOfSearchOptionBase_2>
			<ArrayOfSortOption_3 xsi:nil="true" />
			<ArrayOfResultField_4>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">STATE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">ADMINREGION</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">CITY</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">CITY2</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">POSTCODE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">STREET</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">HOUSENR</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COORDX</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COORDY</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">DETAILLEVEL</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">DETAILLEVEL_DESCRIPTION</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">POPULATION</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">EXTENSIONCLASS</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">LEVEL</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">ISCITYDISTRICT</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_ISO2</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_ISO3</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_COUNTRYCODEPLATE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_DIALINGCODE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_CAPITAL</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">COUNTRY_NAME</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">HOUSENR_SIDE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">HOUSENR_STRUCTURE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">HOUSENR_STARTFORMAT</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">HOUSENR_ENDFORMAT</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">APPENDIX</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">SCORE_TOTALSCORE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">SCORE_FINALPENALTY</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">FOUNDBY_CITY</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">FOUNDBY_CITY2</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">FOUNDBY_POSTCODE</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">FOUNDBY_STREET</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">CLASSIFICATION</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">CLASSIFICATION_DESCRIPTION</ResultField>
				<ResultField xmlns="http://xlocate.xserver.ptvag.com">SWAPANDSPLITMODE</ResultField>
			</ArrayOfResultField_4>
			<CallerContext_5 log1="PTVXLocate Testclient" log2="" log3="">
				<wrappedProperties xmlns="http://baseservices.service.jabba.ptvag.com">
					<CallerContextProperty key="CoordFormat" value="PTV_MERCATOR" />
					<CallerContextProperty key="Profile" value="default" />
					<CallerContextProperty key="ResponseGeometry" value="WKT,PLAIN" />
				</wrappedProperties>
			</CallerContext_5>
		</findAddress>
	</soap:Body>
</soap:Envelope>
and the response

Code: Select all

<?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <ns2:findAddressResponse xmlns:ns6="http://exception.core.jabba.ptvag.com" xmlns:ns1="http://wrappertypes.service.jabba.ptvag.com" xmlns:ns4="http://common.xserver.ptvag.com" xmlns:ns3="http://baseservices.service.jabba.ptvag.com" xmlns:ns2="http://types.xlocate.xserver.ptvag.com" xmlns:ns0="http://xlocate.xserver.ptvag.com">
      <ns2:result errorDescription="" errorCode="0">
        <ns0:wrappedResultList>
          <ns0:ResultAddress classificationDescription="UNIQUE" detailLevelDescription="HNRINTERPOLATED" totalScore="100" countryCapital="Berlin" appendix="" adminRegion="Dresden" houseNumber="8" street="Helfenberger Grund" city2="" city="Dresden" postCode="01326" state="Sachsen" country="D">
            <ns0:wrappedAdditionalFields>
              <ns0:AdditionalField value="D" field="COUNTRY"/>
              <ns0:AdditionalField value="Sachsen" field="STATE"/>
              <ns0:AdditionalField value="Dresden" field="ADMINREGION"/>
              <ns0:AdditionalField value="Dresden" field="CITY"/>
              <ns0:AdditionalField value="" field="CITY2"/>
              <ns0:AdditionalField value="01326" field="POSTCODE"/>
              <ns0:AdditionalField value="Helfenberger Grund" field="STREET"/>
              <ns0:AdditionalField value="8" field="HOUSENR"/>
              <ns0:AdditionalField value="1385117" field="COORDX"/>
              <ns0:AdditionalField value="5103077" field="COORDY"/>
              <ns0:AdditionalField value="9" field="DETAILLEVEL"/>
              <ns0:AdditionalField value="HNrInterpolated" field="DETAILLEVEL_DESCRIPTION"/>
              <ns0:AdditionalField value="14" field="POPULATION"/>
              <ns0:AdditionalField value="0" field="EXTENSIONCLASS"/>
              <ns0:AdditionalField value="1" field="LEVEL"/>
              <ns0:AdditionalField value="false" field="ISCITYDISTRICT"/>
              <ns0:AdditionalField value="DE" field="COUNTRY_ISO2"/>
              <ns0:AdditionalField value="DEU" field="COUNTRY_ISO3"/>
              <ns0:AdditionalField value="D" field="COUNTRY_COUNTRYCODEPLATE"/>
              <ns0:AdditionalField value="0049" field="COUNTRY_DIALINGCODE"/>
              <ns0:AdditionalField value="Berlin" field="COUNTRY_CAPITAL"/>
              <ns0:AdditionalField value="Deutschland" field="COUNTRY_NAME"/>
              <ns0:AdditionalField value="1" field="HOUSENR_SIDE"/>
              <ns0:AdditionalField value="2" field="HOUSENR_STRUCTURE"/>
              <ns0:AdditionalField value="22" field="HOUSENR_STARTFORMAT"/>
              <ns0:AdditionalField value="22" field="HOUSENR_ENDFORMAT"/>
              <ns0:AdditionalField value="" field="APPENDIX"/>
              <ns0:AdditionalField value="100" field="SCORE_TOTALSCORE"/>
              <ns0:AdditionalField value="500000015" field="SCORE_FINALPENALTY"/>
              <ns0:AdditionalField value="32" field="FOUNDBY_CITY"/>
              <ns0:AdditionalField value="0" field="FOUNDBY_CITY2"/>
              <ns0:AdditionalField value="32" field="FOUNDBY_POSTCODE"/>
              <ns0:AdditionalField value="1" field="FOUNDBY_STREET"/>
              <ns0:AdditionalField value="5" field="CLASSIFICATION"/>
              <ns0:AdditionalField value="Unique" field="CLASSIFICATION_DESCRIPTION"/>
              <ns0:AdditionalField value="0" field="SWAPANDSPLITMODE"/>
            </ns0:wrappedAdditionalFields>
            <ns0:coordinates wkt="POINT (1540179.8321 6619323.2057)">
              <ns4:point y="6619323.2057" x="1540179.8321"/>
            </ns0:coordinates>
          </ns0:ResultAddress>
        </ns0:wrappedResultList>
      </ns2:result>
    </ns2:findAddressResponse>
  </soap:Body>
</soap:Envelope>

Re: Classification of geocode result

Posted: Sat Feb 11, 2017 8:14 am
by MISTERX
Dear Joost, dear Bernd

let's have a little discussion on some things from my practices:

#1 STATE/DISTRICT:
Often customers tell me that Customer-Master-Data aka CMD in respect to STATE/DISTRICT are 100% proper and reliable. By this they forced me to take STATE/DISTRICT into account to "minimize" proposals, because user in dialog mode might be confused by a long list. In real world (we are talking about SAP as CMD provider) CMD is often sourced by typing the address into certain fields of UI while STATE/DISTRICT information is not available, but typist is forced by order to supply it. It is obvious that it needs some experience with related countries to enter/choose it correctly. Remark: my customer often have their customers world-wide and CMD-Team is responsible for complete areas - like South Asia and Pacific.
Assuming that this was practiced for many years - starting long before GIS was widely introduced - we are facing a big problem in the moment we enhance the sourcing and reworking process of CMD with geocoding.

In Dialog-Mode (DM) we could not take STATE/DISTRICT into account, because it is not feasible to tell the user "Hi Dear, sorry there is no proposal/match to your input. Please have some educated guesses which portion of your input might cause problems. By the way: are you sure, that your STATE/DISTRICT is correct?".

In Non-Dialog-Mode (Background-Mode = BM) this problem seems to be a big issue. In DM a keen user with some experience is able to get a geocode by varying input or looking up customer in the internet (google etc.). But this is time consuming and not very efficient!
It needs somehow high sophistic coding to vary input and re-request xLocate with respect to the former results. Furthermore, depending to provider and area the format of STATE/DISTRICT varies. Two examples: in North Amerika the information is an abbreviation (NJ, NY, AB etc.) in Europe it is the real name (CH: Ticino = TI, DE: Baden-Württemberg = BW). This causes a lot of problems:
1. As most UIs (like SAP) are working with match codes (abbreviations), it is necessary to have and to use a match table
2. UIs often provide drop-down-lists for those matchings. Users often see the right item but miss it, because they didn't click it properly. For sure available information can be even wrong . . .
3. As the STATE/DISTRICT will be handled strict by xLocate, we have to enter the exact writing into the match table. But what do we see in e.g. SAP? Description field in table is to short and Baden-Württemberg is shortened to "Baden-Wuerttemb.". What a *f*! And on top: in Switzerland the SAP-Standard-Table contains TI = TICINO but for BW (Baden-Württemberg) the key is "08"!
4. xLocate only excepts certain languages for STATE/DISTRICT and this may cause wrong inputs by user, because he is not very familiar with a tiny country like e.g. Switzerland.

At the end I'm asking myself, why PTV handles STATE/DISTRICT strict, while STREET, CITY etc. are handled weak/fuzzy? Is there a Best-Practice? Did I get something wrong?

#2: SCORING:
Scoring is nice and seems to work as designed, but my practice shows that this does not help in many cases. Example: user enters an US-Address with e.g. "Oak Ridge Ave" in street-field and "1234" as house number. xLocate returns with SCORE-TOTALSCORE=100 "1234 Oak Ridge Ave". That's perfect - but it returns some proposals in addition! Depending to result-list-options we get "Oak Ridge Ave W", "Oak Ridge Ave E" and some more versions of "Oak Ridge Ave" with or without house numbers and scoring is slightly below 100 (around 94). Please take into consideration that my example is somehow synthetic but I am able to provide specific ones in US and Europe.
I understand that xLocate assumes that there might be an error in input, because East/West ist only slightly different. Anyhow the problems from above for DM and BM are similar: user has to check, vary and decide what will be the most likely address (position) or background algorithm has to be sophistic to decide by the score fields and detail level/classification!

Is there someone who has made experiences or has found strategies worth to be shared! I will appreciate all posts/replies containing meaningful advices!

Hint: Please imagine, I'm talking about WW projects with amounts of customers equals 100 K and higher! Calculate only 1 minute per geocoding in DM or 1 second per account in BM based on 5 requests in average (including all inner and outer steps to process)!

Re: Classification of geocode result

Posted: Mon Feb 13, 2017 1:28 pm
by Bernd Welter
Hello Mr. X
(btw: keen username, should I call myself "dr. summer" then?),

I totally understand your usecases and I know how the current implementation deals with it: not sufficient.
From my personal perspective the field STATE is more some kind of additional output than goal leading input.
I therefore recommend to exclude it from the geocoding process and only refer to it as a part of a users dialog based confirmation within a dropbox, to simplify his choice.

Our xLocate2 will offer new approaches for singlefield geocoding and I'm sure the awareness will improve then.

As this topic is of valuable information for our development and solution directors I forwarded the post to those players.
I know that this topic is crucial because almost every succeeding task like routing and mapping depends on proper coordinates. One more reason to escalate it to strategic players at PTV.

Let's see what they answer!

Best regards Bernd

Re: Classification of geocode result

Posted: Tue Feb 14, 2017 7:57 am
by MISTERX
Dear Bernd,

thanks for your reply. Your conclusion to take only the STATE from result as additional information into account fits to our solution in specific projects - even when customer didn't like this approach.

I'm curious about new features by xLocate2 (as we are still at the very beginning with it) and hoping that xLocate(1) matures and will have some improvements handling STATE in request.

Kind Regards

Re: Classification of geocode result

Posted: Tue Feb 14, 2017 8:36 am
by Bernd Welter
Hello Mr. X,

The single field search of xLocate 2 will be based on a new kind of additional data named AddressPoints.
This data is large (about 25GB to be stored in maps.path/gcd) and contains the so-called
  • rooftop coordinates of the house (entry, e.g. Bakerstreet 221B - which is anpther benefit of the data: suffixes!)
  • the street coordinates of the house, so the point where the routing usually starts
The coverage of this data varies from country to country but for almost every country in Europe we get 99% coverage from our data providers.

Furthermore the xLocate2 also offers a REST api.

Best regards Bernd