Simple database management using SQLite in R

Learn how to use SQLite in R, a very lightweight relational database management system (RDBMS).

Create databases and tables

The first step is to create a database. Use the dbConnect() function to create an appropriate database for the mtcars dataset.

# Load the RSQLite Library
library(RSQLite)
# Load the mtcars as an R data frame put the row names as a column.
data("mtcars")
mtcars$car_names <- rownames(mtcars)
rownames(mtcars) <- c()
head(mtcars)
# Create a connection to our new database, CarsDB.db
# you can check that the .db file has been created on your working directory
conn <- dbConnect(RSQLite::SQLite(), "CarsDB.db")

Once the database is created, you can continue to use the dbWriteTable() function to create a table in the database. This function can accept multiple parameters:

conn: connect to your SQLite database Name: the name you want to use for the table value: inserted data

After that, you can use the function dbListTables() and SQLite database connection as parameters to check whether the table has been successfully created.

# Write the mtcars dataset into a table names mtcars_data
dbWriteTable(conn, "cars_data", mtcars)
# List all the tables available in the database
dbListTables(conn)
##'cars_data'

If you have multiple data, you can add more data to the existing table by setting the optional parameter append = TRUE in dbWriteTable(). For example, you can create a new table containing some cars and manufacturers by adding two different data.

# Create toy data frames
car <- c('Camaro', 'California', 'Mustang', 'Explorer')
make <- c('Chevrolet','Ferrari','Ford','Ford')
df1 <- data.frame(car,make)
car <- c('Corolla', 'Lancer', 'Sportage', 'XE')
make <- c('Toyota','Mitsubishi','Kia','Jaguar')
df2 <- data.frame(car,make)
# Add them to a list
dfList <- list(df1,df2)
# Write a table by appending the data frames inside the list
for(k in 1:length(dfList)){
    dbWriteTable(conn,"Cars_and_Makes", dfList[[k]], append = TRUE)
}
# List all the Tables
dbListTables(conn)
##"Cars_and_Makes" "cars_data"

Ensure that all data is in the new table

dbGetQuery(conn, "SELECT * FROM Cars_and_Makes")

car

make

Camaro

Chevrolet

California

Ferrari

Mustang

Ford

Explorer

Ford

Corolla

Toyota

Lancer

Mitsubishi

Sportage

Kia

XE

Jaguar

Execute SQL query

You can use dbGetQuery() to execute a valid SQL query. The function has the following parameters:

conn: connect to SQLite database Query: SQL query executed

NOTE: through RSQLIte, you can execute any query, from simple SELECT statements to joins (except RIGHT OUTER JOINS and FULL OUTER JOINS, which are not supported in SQLite).

# Gather the first 5 rows in the cars_data table
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 5")
# Get the car names and horsepower starting with M that have 6 or 8 cylinders
dbGetQuery(conn,"SELECT car_names, hp, cyl FROM cars_data
                 WHERE car_names LIKE 'M%' AND cyl IN (6,8)")

car_names

hp

cyl

Mazda RX4

110

6

Mazda RX4 Wag

110

6

Merc 280

123

6

Merc 280C

123

6

Merc 450SE

180

8

Merc 450SL

180

8

Merc 450SLC

180

8

Maserati Bora

335

8

To store the result of the query so that further operations can be performed in R, you only need to assign the result of the query to a variable.

avg_HpCyl <- dbGetQuery(conn,"SELECT cyl, AVG(hp) AS 'average_hp'FROM cars_data
                 GROUP BY cyl
                 ORDER BY average_hp")
avg_HpCyl

Add variable query

One of the biggest advantages of operating a SQLite database from R is the ability to use parametric queries. That is, the SQLite database can be queried using the variables available in the R workspace.

# Lets assume that there is some user input that asks us to look only into cars that have over 18 miles per gallon (mpg)
# and more than 6 cylinders
mpg <-  18
cyl <- 6
Result <- dbGetQuery(conn, 'SELECT car_names, mpg, cyl FROM cars_data WHERE mpg >= ? AND cyl >= ?', params = c(mpg,cyl))
Result

car_names

mpg

cyl

Mazda RX4

21.0

6

Mazda RX4 Wag

21.0

6

Hornet 4 Drive

21.4

6

Hornet Sportabout

18.7

8

Valiant

18.1

6

Merc 280

19.2

6

Pontiac Firebird

19.2

8

Ferrari Dino

19.7

6

Queries that do not return tabular data

You may want to execute SQL queries that do not necessarily return tabular data. Examples of these operations include inserting, updating, or deleting table records. To do this, we can use the function dbExecute(), which takes an SQLite database connection and an SQL query as parameters.

# Visualize the table before deletion
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")
# Delete the column belonging to the Mazda RX4.
dbExecute(conn, "DELETE FROM cars_data WHERE car_names = 'Mazda RX4'")
# Visualize the new table after deletion
dbGetQuery(conn, "SELECT * FROM cars_data LIMIT 10")

close

When the SQLite database operation is completed in R, it is very important to call the function dbDisconnect(). This ensures that the resources that the database connection has been using are released.

# Close the database connection to CarsDB
dbDisconnect(conn)

Posted by frymaster on Thu, 14 Apr 2022 14:11:07 +0930