Find the top ten countries with the highest number of adverse events demography%>%group_by(Country= occr_country)%>% The pipeline %>% takes the output from the left-hand side of the pipe as the first argument to the function on the right hand side. Load R Packages library(dplyr)ĭownload adverse events data year_start=2013 The adverse events are released in quarterly data files (a single data file for every category every three months). For demonstration purposes, let’s use the adverse event reports from 2013-2015. You can simply run the code below and it will download the adverse events data and create one large dataset, for each category, by merging the various datasets. You can read more about the adverse events data in my previous post. Then, we will put all the datasets in a database and use dplyr to work with the databases. In this post, we will download demography information of the patients, drug they used and for what indication they used it, reaction and outcome. These data are in public domain and anyone can download them and analyze them. When people take drugs, if they experience any adverse events, they can report it to the FDA. You can get more information from the dplyr database vignette here. In this post, we will see how to work with sqlite database. dplyr considers database tables as data frames and it uses lazy evaluation (it delays the actual operation until necessary and loads data onto R from the database only when we need it) and for someone who knows Spark, the processes and even the functions have similarities.ĭplyr supports a couple of databases such as sqlite, mysql and postgresql. Further, dplyr functions are easy to write and read. Further, if we have many data files, putting our data in a database, rather than in csv or other format, has better security and is easier to manage.ĭplyr is a really powerful package for data manipulation, data exploration and feature engineering in R and if you do not know SQL, it provides the ability to work with databases just within R. If our data resides in a database, rather than loading the whole data to R, we can work with subsets or aggregates of the data that we are interested in. Using dplyr with databases has huge advantage when our data is big where loading it to R is impossible or slows down our analytics. The basic functions of dplyr package are covered by Teja in another post at DataScience+ In this extensive and comprehensive post, I will share my experience on using dplyr to work with databases. The dplyr package, which is one of my favorite R packages, works with in-memory data and with data stored in databases.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |