Learn how to use SQLite in R, a very lightweight relational database management system (RDBMS).
Create databases and tables
The first step is to create a database. Use the dbConnect() function to create an appropriate database for the mtcars dataset.
# Load the RSQLite Library library(RSQLite) # Load the mtcars as an R data frame put the row names as a column. data("mtcars") mtcars$car_names <- rownames(mtcars) rownames(mtcars) <- c() head(mtcars) # Create a connection to our new database, CarsDB.db # you can check that the .db file has been created on your working directory conn <- dbConnect(RSQLite::SQLite(), "CarsDB.db")
Once the database is created, you can continue to use the dbWriteTable() function to create a table in the database. This function can accept multiple parameters:
conn: connect to your SQLite database Name: the name you want to use for the table value: inserted data
After that, you can use the function dbListTables() and SQLite database connection as parameters to check whether the table has been successfully created.
# Write the mtcars dataset into a table names mtcars_data dbWriteTable(conn, "cars_data", mtcars) # List all the tables available in the database dbListTables(conn) ##'cars_data'
If you have multiple data, you can add more data to the existing table by setting the optional parameter append = TRUE in dbWriteTable(). For example, you can create a new table containing some cars and manufacturers by adding two different data.
# Create toy data frames car <- c('Camaro', 'California', 'Mustang', 'Explorer') make <- c('Chevrolet','Ferrari','Ford','Ford') df1 <- data.frame(car,make) car <- c('Corolla', 'Lancer', 'Sportage', 'XE') make <- c('Toyota','Mitsubishi','Kia','Jaguar') df2 <- data.frame(car,make) # Add them to a list dfList <- list(df1,df2) # Write a table by appending the data frames inside the list for(k in 1:length(dfList)){ dbWriteTable(conn,"Cars_and_Makes", dfList[[k]], append = TRUE) } # List all the Tables dbListTables(conn) ##"Cars_and_Makes" "cars_data"
Ensure that all data is in the new table
dbGetQuery(conn, "SELECT * FROM Cars_and_Makes")
car | make |
---|---|
Camaro | Chevrolet |
California | Ferrari |
Mustang | Ford |
Explorer | Ford |
Corolla | Toyota |
Lancer | Mitsubishi |
Sportage | Kia |
XE | Jaguar |
Execute SQL query
You can use dbGetQuery() to execute a valid SQL query. The function has the following parameters:
conn: connect to SQLite database Query: SQL query executed
NOTE: through RSQLIte, you can execute any query, from simple SELECT statements to joins (except RIGHT OUTER JOINS and FULL OUTER JOINS, which are not supported in SQLite).
# Gather the first 5 rows in the cars_data table dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 5")
# Get the car names and horsepower starting with M that have 6 or 8 cylinders dbGetQuery(conn,"SELECT car_names, hp, cyl FROM cars_data WHERE car_names LIKE 'M%' AND cyl IN (6,8)")
car_names | hp | cyl |
---|---|---|
Mazda RX4 | 110 | 6 |
Mazda RX4 Wag | 110 | 6 |
Merc 280 | 123 | 6 |
Merc 280C | 123 | 6 |
Merc 450SE | 180 | 8 |
Merc 450SL | 180 | 8 |
Merc 450SLC | 180 | 8 |
Maserati Bora | 335 | 8 |
To store the result of the query so that further operations can be performed in R, you only need to assign the result of the query to a variable.
avg_HpCyl <- dbGetQuery(conn,"SELECT cyl, AVG(hp) AS 'average_hp'FROM cars_data GROUP BY cyl ORDER BY average_hp") avg_HpCyl
Add variable query
One of the biggest advantages of operating a SQLite database from R is the ability to use parametric queries. That is, the SQLite database can be queried using the variables available in the R workspace.
# Lets assume that there is some user input that asks us to look only into cars that have over 18 miles per gallon (mpg) # and more than 6 cylinders mpg <- 18 cyl <- 6 Result <- dbGetQuery(conn, 'SELECT car_names, mpg, cyl FROM cars_data WHERE mpg >= ? AND cyl >= ?', params = c(mpg,cyl)) Result
car_names | mpg | cyl |
---|---|---|
Mazda RX4 | 21.0 | 6 |
Mazda RX4 Wag | 21.0 | 6 |
Hornet 4 Drive | 21.4 | 6 |
Hornet Sportabout | 18.7 | 8 |
Valiant | 18.1 | 6 |
Merc 280 | 19.2 | 6 |
Pontiac Firebird | 19.2 | 8 |
Ferrari Dino | 19.7 | 6 |
Queries that do not return tabular data
You may want to execute SQL queries that do not necessarily return tabular data. Examples of these operations include inserting, updating, or deleting table records. To do this, we can use the function dbExecute(), which takes an SQLite database connection and an SQL query as parameters.
# Visualize the table before deletion dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10") # Delete the column belonging to the Mazda RX4. dbExecute(conn, "DELETE FROM cars_data WHERE car_names = 'Mazda RX4'") # Visualize the new table after deletion dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")
close
When the SQLite database operation is completed in R, it is very important to call the function dbDisconnect(). This ensures that the resources that the database connection has been using are released.
# Close the database connection to CarsDB dbDisconnect(conn)