Revision history for Perl CPAN module Lingua::EN::AddressParse 1.27 05 Jan 2018 Stopped errors from undefined base_satreet_name componet. Giothub issue reported by Bill Baker Allowed for house names at start of address, such as 'Manor House 12 Smith St... Added more street types Removed Island as a street type, too many clashes with Island appearing in other components Added extra test case to main.t for street direction prefix 1.26 15 Sep 2016 the auto_clean option now standardises more street types, sub property types and identifiers added more street names, types and sub property identifier patterns Directional streets (like East St) given higher matching priority Fixed bug in synopsis example Improved documentation 1.25 12 Jul 2016 Used github as repository Improved documentation 1.24 11 Jun 2015 The case_components method has been removed, only the components method is needed (due to change to using upper case grammar in version 1.22) an optional flag allows for all components to be upper cased Allow for numeric avenues with optional fraction, such as 'Avenue 12 1/2' Added cases such as 'VIA CAPRI', to the street noun pattern Allow for property numbers of form xy.5 Extract any building, level, floor data prior to parsing and save as separate attributes 1.23 20 Apr 2015 Allowed for 4 word suburb names Introduced street_noun pattern, no need to name all 'The Corso, Avenue...' instances Detect vowels sounds in suburb words No vowel sound in street or suburb now raises a warning, not an error in properties hash Related to the above, added a 'warning_desc' component to address object Sub property portion of identifier now correctly assigned, so '2/24A' has sub_property_id => 2, property_id => 24A Added sub_proprty_type components, such as "Unit", "Apt" etc Allowed for half street numbers, such has '45 1/2 Some St; Refactored some repetitive code sections 1.22 21 Jan 2015 fold all text to upper case prior to parsing, regexp matching is faster if case insensitive use extended regexp (/x) for better readability added many more valid street combinations allowed for prefix before ordinal street, such as "Bay 24th Street" folded 'The Avenue' type of street back into main street grammar, removed street_noun grammar NOTE: component 'street' changed to 'street_name' for clarity added street_direction_prefix component added base_street_name component (street name less direction prefix) auto_clean option now removes full stops, so P.O. Box 12 => PO Box 12, grammar no longer accepts full stops 1.21 14 Oct 2014 allowed for more untyped streets such as 'Avenue C', 'State Route 19' more transformations added to the _clean function added more street types removed indirect object notation for 'new' method validate 'country' parameter passed in as an argument to the new method 1.20 22 Jul 2014 added person's street name such as 'Dr Martin Luther King Jr' new component added, po_box_type to describe type of post box, so 'PO BOX 123 Newtown Private Boxes WA 6865' will return suburb of Newtown and po_box_type of Private Boxes fixed incorrect error descriptions on parsing error added more sub property identifiers added more street types detect missing mandatory country parameter in arguments to 'new' method auto_clean now removes full stops (as in P.O. Box, Rd. etc), simplifies grammar 1.19 7 Jul 2014 auto_clean always cleans string, not just on parsing error during auto_clean, spaces now removed in sub property identifer, such as 1 A Main St -> 1A Main St fixed floor description in sub property identifier added street name such as 'John F Kennedy Boulevard' post codes can now be optional with the force_post_code argument added more 3 word street names added more street types many more auto_clean functions for redundant, missing or malformed characters stopped single letter streets returning 'no vowel sound' error added more tests 1.18 19 Sep 2013 Fixed test for US suburban address in main.t 1.17 18 Sep 2013 Added Build.PL Expanded test coverage to more address formats Expanded auto_clean functions Corrected casing of letters in sub property identifiers, such as Unit 23B (retain B as capital) Allowed for government roads, such as 12 US Highway 19 Allowed for single word streets such as Broadway, no street type Allowed for more 3 word street types Added more ordinal street types Allowed post box numbers to be up to 6 digits Added more street types Added more sub property types, unified sub_property grammar Added more street names Improved report method Keep street direction in all capitals when casing components _validate now checks for PO box format 1.16 20 Jan 2011 Moved AddressGrammar.pm to Lingua::EN::AddressParse::Grammar name space Included tests for Canadian and UK addresses 1.15 17 Jun 2007 Added sample pre parser for correcting common errors Added more street types Added more street nouns Added more sub property types Allowed for optional street number, name and type in rural addresses Allowed for up to 5 numbers in property identifier Allowed for street names with up to three words, such as 'Tin Can Bay Road' Allowed for sub building patterns such as 'Level 6 Tower A 123 Main St' Thanks to Kieren Diment for test cases 1.14 18 Jul 2005 Fixed auto_clean option so that quotes are retained, needed for rural property names Added report method to dump address properties and components Updated distribution to current CPAN requirements 1.13 22 Jun 2005 Added US style sub properties, such as 123 Main St Suite # 12 Somewhere CA 92345 Added US style street directional suffix, such as 123 Main St S Somewhere CA 92345 Added new address type 'road_box'. This allows for box identifier such as RMB, optionally street name and type, then suburb and post code. Sub property values are now returned as as separate component sub_property_identifier, not as part of the property_identifier component Created a better grammar for property_identifier Removed incorrect validity tests that were not accounting for ordinal street types Fixed auto_clean option so that numbers are retained in ordinal street types! Expanded and improved documentation and comments 1.12 07 Mar 2004 Removed any sub country descriptive text (in square brackets) that occurs from grammar. This was needed because of an upgrade to the data in Locale-SubCountry 1.11 21 Mar 2002 Let user specify country argument to 'new' method as either country code (AU) or full name (AUSTRALIA). Changed documentation to reflect this, altered United Kingdom code from 'UK' (incorrect) to 'GB' Fixed bug that was preventing Canadian addresses from parsing Added a test case for each country in main.t Improved report layout in demo.pl to allow for non Australian addresses 1.10 09 Feb 2002 Allowed for cases such as "Unit 4 12 Smith St." Improved speed by reordering sequence of street types Added option to only accept short form of sub_country, such as NW Thanks to Peter Schendzielorz for the following suggestions: Allowed sub_country names to occur in suburb, such as Victoria Valley VIC Added more street types Added more sub property types Added more post box types Added 'Mt', 'Sir' and 'Dame' to street prefixes, as in Mt Baw Baw Rd Allowed for cases like 'Grand Ridge Road' Allowed for cases like 'Close','Glen', 'St' etc to occur in either street name or street type Allowed for apostrophes and full stops in suburb_word, as in French's Forest Allowed for 3 digits post codes such as 800 (in Northern Territory, Aust) Allowed for box_number to end in a letter Allowed property name to be delimited by single or double quotes Accounted for street types as street names, such as 'The Corso', 'The Parkway' 1.05 24 Sep 2001 Added more street types Removed duplicated declaration for main.t 1.04 31 Aug 2001 Allowed for streets with numbers or single letters (42nd, A, B...) Allowed for ZIP codes of type 12345-6789 Thanks to Mike Edwards for noting these limitations Added 'Park'to street types, while accounting for cases like 'Park Lane, 'Moore Park Road' etc. Added extra test case Added street prefixes (like Old, East) back in to simplify grammar 1.03 24 May 2001 Allowed for forced abbreviations of sub country Removed POD directives from README 1.02 24 Apr 2001: Placed "my" declaration of loop variables inside foreach statements. NOTE THIS MEANS PERL 5.004 IS NOW THE MINIMUM REQUIREMENT! Full names for sub countries now supported (New South Wales as well as NSW) Used look aheads to simplify grammar (thanks to Damian Conway for his help) Removed street prefixes from grammar as they were not needed Returned street type (Road, Lane etc) as a separate element from street Made property identifiers optional for suburban addresses (Low St. Kew VIC 3012 is OK) Initialised values for all components and properties to empty string Stopped warnings generated by uninitialized strings in grammar Only allow country names that match initializing country string Removed space that was occurring between PO Box and following number Added more cases to test file main.t Improved demo.pl Fixed all POD and line terminator errors, thanks to Jason Gallagher 1.01 9 Apr 2001: Fixed "use Locale::SubCountry" call in AddressGrammar 1.00 18 Mar 2001: Upgraded to work with changes to Locale::SubCountry, specifically, US rather than USA being the valid code for initializing US addresses Moved grammar definition to AddressGrammar.pm module 0.03 14 May 2000: Added Canadian post codes, thanks to Steve Taylor Replaced commas with spaces, rather than removing them Removed @EXPORT and @EXPORT_OK as they were not needed 0.02 30 Apr 2000: Used Locale::Subcountry for list of states, counties etc Added more UK post code types, thanks to Mark Summerfield 0.01 28 Dec 1999: First Release