This function is a wrapper around readxl::read_excel(), reading a CBS
socio-economic or peripherality index data file for a specific year. Its added
value is in figuring out the file's structure for you: by default it detects
the geographic level of the file (municipality, locality or statistical area)
from its contents and applies the matching header layout. The full set of
header parameters is available with il.cbs.muni:::df_cbs_index_params.
This function is marked experimental because the CBS index publications are not
stable across editions: the same geographic level lives in different table
numbers in different years and the file format alternates between xls and
xlsx. Detecting the level from the file content rather than relying on the
table number makes the reader robust to that shuffle - you give it one file and
it works out what the file is.
Usage
read_cbs_index(
path,
year,
index_type = c("ses", "peri"),
unit_type = NULL,
cols = NULL,
col_names = NULL,
quiet = FALSE
)Arguments
- path
A character vector of length 1, denoting the local file path to the CBS index data file. A full list of available files by the CBS is at the relevant CBS page for either Socio-Economic Status (SES) or for peripheral level. Read a single file per call; to read several files, iterate (e.g. with
purrr::map()).- year
A numeric vector of length 1 denoting which year the data file pointed in
pathis for. Be aware that the year in question is the year the data is for, not the year the data was published in.- index_type
A character vector of length 1, one of
c("ses", "peri").- unit_type
NULL(the default) or a character vector of length 1, one ofc("muni", "yishuv", "sa"), denoting the geographic level: municipality, locality or statistical area. WhenNULL, the level is detected from the file content. When supplied, the given level is used as-is and detection is skipped, which lets you override a misdetection for a file whose level you already know.- cols
<tidy-select> Columns to keep. The default
NULLkeeps all columns.- col_names
A character vector containing the new column names of the output tibble. If
NULLthen the tibble uses the original column names. Must be the same length as the number of columns picked incols. The names are assigned by position, so they follow the data's column order.- quiet
A logical vector of length 1. When
FALSE(the default) a message reports the geographic level used. Set toTRUEto silence it, for example when iterating over many files.
Value
A tibble with CBS index data for a specific year, where every row is a
single geographic unit (municipality, locality or statistical area) and every
column is a different variable for that unit. The geographic level used is also
stored in the "unit_type" attribute of the result. Be advised all columns are
of type character, so you need to parse the data types yourself at will. Column
names are merged from the relevant headers, and only single whitespaces are
kept. Rows that are 80% or more empty cells (usually rows with non-data notes)
are removed.
Details
Detection uses the raw row count and a header keyword, which separates the
three levels cleanly for the files published so far. It can still be wrong when
a file does not match the expected geometry - for example a table that is not a
standard index but happens to be municipality-sized, or two different tables in
the same edition that share a row-count band. When you know the level, pass it
explicitly through unit_type to bypass detection.
Examples
read_cbs_index(
path = system.file("extdata", "24_22_375t1.xlsx", package = "il.cbs.muni"),
year = 2019,
index_type = "ses"
) |>
dplyr::glimpse()
#> Detected municipality-level ses index file (266 raw rows).
#> Rows: 255
#> Columns: 12
#> $ `מעמד מוניציפלי_MUNICIPAL STATUS` <chr> …
#> $ `סמל יישוב_CODE OF LOCALITY` <chr> …
#> $ `שם רשות מקומית` <chr> …
#> $ מחוז_DISTRICT <chr> …
#> $ `אוכלוסיית המדד 2019[1]_INDEX POPULATION 2019[1]` <chr> …
#> $ `ערך מדד 2019[2]_INDEX VALUE 2019[2]` <chr> …
#> $ `דירוג 2019[3]_RANK 2019[3]` <chr> …
#> $ `אשכול 2019[4]_CLUSTER 2019[4]` <chr> …
#> $ `דירוג 2017[3]_RANK 2017[3]` <chr> …
#> $ `אשכול 2017[4]_CLUSTER 2017[4]` <chr> …
#> $ `הפרש (אשכול 2019 פחות אשכול 2017)_DIFFERENCE (CLUSTER 2019 MINUS CLUSTER 2017)` <chr> …
#> $ `NAME OF LOCAL AUTHORITY` <chr> …