Microsoft SQL Server is a powerful relational database management system that can transform raw data into meaningful reports that can be sent to any device. SQL Server performs well in tPC-E OLTP workloads, TPC-H data warehouse workloads, and real-world application performance benchmarks.

This article will show you how to use the R language in Microsoft SQL Server by looking at sample data and generating some graphs using rxHistogram from the RevoScaleR and R base Hist functions (which are already included in the R database). And demonstrates how to call the R function to save the results of an application file from a transact-SQL stored procedure.

View the data

Developing data science solutions often involves in-depth data exploration and data visualization. In the original public data set, taxi identifiers and trip records are provided in separate files. However, to make the sample data easy to use, the two raw datasets are connected to medallion columns, hack_license, and Pickup_datetime. The resulting low sampling rate dataset has 1703957 rows and 23 columns.

Taxi identifier

  • The Medallion column represents the unique ID number of the taxi.
  • Hack_license contains the taxi driver’s license number (anonymous).

Itinerary and expense records

  • Each trip record includes the pickup and drop-off locations, time and distance traveled.
  • Each fee record includes payment information, such as payment type, total payment, and tip amount.
  • The last three columns can be used for a variety of machine learning tasks.
  • The label column values are based on the TIP_AMOUNT column, using the following business rules:
    Derived column name The rules
    tipped If tip_amount > 0, tipped = 1; Otherwise tipped = 0
    tip_class class 0:tip_amount = $0
    Class 1:tip_amount > $0 and tip_amount <= $5
    Class 2:tip_amount > $5 and tip_amount <= $10
    Class 3:tip_amount > $10 and tip_amount <= $20
    class 4:tip_amount > $20

Create stored procedures using rxHistogram to plot the data

Create a plot using rxHistogram from the enhanced R function provided by RevoScaleR. The histogram drawn in this step is based on a Transact-SQL query and can be wrapped in a stored procedure.

In SQL Server Management Studio, right-click the NYCTaxi_Sample database and select New Query. Paste the following script to create a stored procedure and draw the histogram.

CREATE PROCEDURE [dbo].[RxPlotHistogram]
AS
BEGIN
  SET NOCOUNT ON;
  DECLARE @query nvarchar(max) =  
  N'SELECT tipped FROM nyctaxi_sample'  
  EXECUTE sp_execute_external_script @language = N'R',  
                                     @script = N' image_file = tempfile(); jpeg(filename = image_file); #Plot histogram rxHistogram(~tipped, data=InputDataSet, col=''lightgreen'', title = ''Tip Histogram'', xlab =''Tipped or not'', ylab =''Counts''); dev.off(); OutputDataSet <- data.frame(data=readBin(file(image_file, "rb"), what=raw(), n=1e6)); ',  
   @input_data_1 = @query  
   WITH RESULT SETS ((plot varbinary(max)));  
END
GOCopy the code

Key points in this script include:

  • This variable defines the query text that is passed to the R script as an argument to the script input variable. For R scripts running as external processes, there should be a one-to-one mapping between your script input and the input of the sp_execute_external_script system stored procedure that starts the R session in SQL Server.
  • R a variable in the script that defines the storage image.
  • Call the function rxHistogram from the RevoScaleR library to generate the plot.
  • R device is set to off because this command is being run using an external script in SQL Server. Typically when advanced drawing commands are issued in R, R opens a graphical window to invoke the device. If you want to write to a file or process some other type of output, you can turn the device off.
  • R graphics are serialized into R data frames for output.

Performs stored procedures and exports binary data to image files using BCP

The image returned by this stored procedure is a Varbinary data stream and obviously cannot be viewed directly. However, you can use the BCP utility to retrieve this Varbinary data and save it as an image file on the client computer.

  1. inManagement StudioRun the following statement:
    EXEC [dbo].[RxPlotHistogram]Copy the code
  2. Open the PowerShell command prompt and run the following command, providing the corresponding instance name, database name, user name, and credentials as parameters.
    bcp "exec RxPlotHistogram" queryout "plot.jpg" -S <SQL Server instance name> -d  NYCTaxi_Sample  -U <user name> -P <password> -TCopy the code
  3. If the connection is successful, you will be prompted for details about the graphics file format.
    Enter the file storage type of field plot [varbinary(max)]: 
    Enter prefix-length of field plot [8]: 0
    Enter length of field plot [0]:
    Enter field terminator [none]:
    
    Do you want to save this format information in a file? [Y/n]
    Host filename [bcp.fmt]:Copy the code

    The results of

    Starting copy... 1 rows copied. Network Packet Size (bytes): 4096 Clock Time (Ms.) Total: 3922 Average: (0.25 rows per SEC.)Copy the code
  4. Press ENTER at each prompt to accept the default Settings, except for the following changes:
    • For prefix-length of field plot, type 0.
    • If you want to save the output parameters for later reuse, type Y.
  5. The output file is created in the same directory as the PowerShell command. To view the chart, simply open the file plot.jpg.

Create stored procedures using Hist and multiple output formats

Typically, data scientists generate multiple visualizations to understand the data from different perspectives. In this example, you will create a stored procedure called RPlotHist to write histograms, scatter plots, and other R graphs to JPG and PDF formats. This stored procedure uses the Hist function to create histograms that export binary data for common formats such as JPG, PDF, and PNG.

In SQL Server Management Studio, right-click the NYCTaxi_Sample database and select New Query. Paste the following script to create a stored procedure and draw the histogram. In this example, the name is RPlotHist.

CREATE PROCEDURE [dbo].[RPlotHist]  
AS  
BEGIN  
  SET NOCOUNT ON;  
  DECLARE @query nvarchar(max) =  
  N'SELECT cast(tipped as int) as tipped, tip_amount, fare_amount FROM [dbo].[nyctaxi_sample]'  
  EXECUTE sp_execute_external_script @language = N'R',  
  @script = N' # Set output directory for files and check for existing files with same names mainDir <- ''C:\\temp\\plots'' dir.create(mainDir, recursive = TRUE, showWarnings = FALSE) setwd(mainDir); print("Creating output plot files:", quote=FALSE) # Open a jpeg file and output histogram of tipped variable in that file. dest_filename = tempfile(pattern = ''rHistogram_Tipped_'', tmpdir = mainDir) dest_filename = paste(dest_filename, ''.jpg'',sep="") print(dest_filename, quote=FALSE); jpeg(filename=dest_filename); hist(InputDataSet$tipped, col = ''lightgreen'', xlab=''Tipped'', ylab = ''Counts'', main = ''Histogram, Tipped''); dev.off(); # Open a pdf file and output histograms of tip amount and fare amount. # Outputs two plots in one row dest_filename = tempfile(pattern = ''rHistograms_Tip_and_Fare_Amount_'', tmpdir = mainDir) dest_filename = paste(dest_filename, ''.pdf'',sep="") print(dest_filename, quote=FALSE); pdf(file=dest_filename, height=4, width=7); Par (mfrow = c (1, 2)); hist(InputDataSet$tip_amount, col = ''lightgreen'', xlab=''Tip amount ($)'', ylab = ''Counts'', main = ''Histogram, Tip amount'', xlim = c(0,40), 100); hist(InputDataSet$fare_amount, col = ''lightgreen'', xlab=''Fare amount ($)'', ylab = ''Counts'', main = ''Histogram, Fare amount'', xlim = c(0,100), 100); dev.off(); # Open a pdf file and output an xyplot of tip amount vs. fare amount using lattice; Dest_filename = tempfile(plotted here, otherwise file is large)'rXYPlots_Tip_vs_Fare_Amount_'', tmpdir = mainDir) dest_filename = paste(dest_filename, ''.pdf'',sep="") print(dest_filename, quote=FALSE); pdf(file=dest_filename, height=4, width=4); Plot (tip_amount ~ fare_amount, data = InputDataSet[sample(nrow(InputDataSet), 10000),], ylim = c(0,50), Xlim = c(0,150), cex=.5, PCH =19, col=''darkgreen'', main = ''Tip amount by Fare amount'', xlab=''Fare Amount ($)'', ylab = ''Tip Amount ($)''); dev.off(); ',  
 @input_data_1 = @query  
 ENDCopy the code

The output of the SELECT query within this stored procedure is stored in the default R data frame InputDataSet, and various R plotting functions can then be called to generate the actual graph file.

All files will be saved to the local folder C:\temp\Plots. This target folder is supplied as part of the stored procedure to the parameter definition of the R script. You can change this target folder by changing the value of the variable mainDir.

To output the file to another folder, change the variable value of mainDir in the R script embedded in the stored procedure. You can also modify the script to output different formats, and so on.

Execute the stored procedure

Run the following statements to export the binary drawing data to JPEG and PDF file formats.

EXEC RPlotHistCopy the code

The results of

STDOUT message(s) from external script:
[1] Creating output plot files:[1] C:\temp\plots\rHistogram_Tipped_18887f6265d4.jpg[1] 

C:\temp\plots\rHistograms_Tip_and_Fare_Amount_1888441e542c.pdf[1]

C:\temp\plots\rXYPlots_Tip_vs_Fare_Amount_18887c9d517b.pdfCopy the code

The numbers in the file name are generated randomly to ensure that no errors occur when attempting to write to an existing file.

View the output

To view the drawing, open the target folder and view the file for the R code in the created stored procedure.

  1. Go to the folder indicated in the standard output message (in the example, C:\temp\plots).
  2. Open rHistogram_Tipped. JPG to display the trips that received tips and the number of trips that received tips (this histogram is similar to the one generated in the previous step).
  3. Open rhistograms_tip_and_fare_amount.pdf to see the amount distribution plotted based on the amount of the tip.
  4. Open rxyplots_tip_vs_fare_amount.pdf to view a scatter chart of tip amounts on the x and y axes.