UKB_database

Setting up the database

You will need the scripts in the database folder.

How to use

To extract data from your database, in R, see Extracting data in R.

Optional Extras

Stata

If you want to access data from the database using Stata, you will additionally need to generate the .do and .dct from the download using ukbconv.

This might make the whole database system seem superfluous - but the benefit is: if your data download is too large for ukbconv to convert it to a stata .dta file in one go, it doesn’t matter. We literally only need the .do and .dct files, and ukbconv makes those first, after which you can kill the process.

To extract the data, follow the procedure described in Extracting Data in Stata.

Mapping variable names

When running ukb_db() you can also pass in the path to a mapping sheet giving human-readable names to the UKB variables. This will produce a csv file containing all un-named variables and their descriptions, so you can add them to your mapping sheet.

The variables are saved in the database with the raw UKB variable names (eg f.52.0.0). When you extract the data, you can supply a mapping sheet (the default is our standard renaming spreadsheet Renaming_List_UPDATE_Sep2020_TEU.csv) and the selected names will be applied to the extracted data.

By using the raw names in the database we can be more flexible with our choice of renaming - if we want to change one of the variable name in the spreadsheet, or convert to an entirely new naming system, we don’t have to regenerate the database.

Cautionary Note

The R package duckdb is still under development, which means that unfortunately new versions of the package are often not backwards compatible. This means a database written under one version of duckdb cannot be read by a later version.

Please consider using some form of package management, for example renv to facilitate control over package versions.