Find fuzzymatch for variety using stringdist using methods "lv" and "jw"
Source:R/match_variety_fuzzy_module.R
find_fuzzymatch.Rd
Note: This function returns the fuzzymatches from stringdist.variety() and then places stricter requirements on variety names that end in a number with greater than one digit (e.g. the ending number must be an exact match with a string of numbers in the fuzzymatch). This requirement is to limit the number of close matches for numeric variety names. Because of this condition, must be careful when generating var_id from the raw names to remove any trailing digits that have no meaning (e.g. +25% refers to seed increase and not to a cultivar name)
Usage
find_fuzzymatch(
var_noexactmatch,
data_name,
intid_col,
select_before = Sys.Date(),
select_crops = NULL,
knitroutputfolder = knitroutputfolder,
db_folder
)
Arguments
- var_noexactmatch
A data.frame of the varieties without exact matches - the data.frame should contain a column for variety and internal id (intid)
- data_name
A string denoting the name of the dataset
- intid_col
A string denoting the name of the internal id column used to join the raw data and database
- select_before
A string in the format of Ymd. The function returns cultivars that were added to the datebase before this specified date.
- select_crops
A regular expression of crops separated by |. Note that this regex will filter on the
crop
column rather thancrop_type
. For example, there are entries in the wheat file with crop Triticale, so to capture both the input should beselect_crops = "wheat|triticale"
.- knitroutputfolder
A folder path to send output
- db_folder
A string path to the database controlled vocabulary folder
See also
Other match variety modules:
do_exactmatch()
,
process_fuzzymatch()
,
process_std_new_names()