Importing a Dataset in Quadstat, R and SAS
This guide discusses how to import a dataset using Quadstat, R, and SAS, in that order.
To import a dataset into Quadstat, you must first register for a free account. Please see this guide on how to register an account with Quadstat.
Once you have registered and set your password, click the "Create Dataset" link at the top of the page. From there, you will be presented with a screen similar to the one shown below. Feel free to click the "Tour" tab to start a guided tour of the import screen.
From here you can choose a method to create a dataset:
- Use randomly generated numbers as the dataset values
- Upload a dataset file from your computer
- Copy and paste a dataset into a textarea
- Start with an empty dataset
Depending on which method is chosen, the user will have the option to choose whether or not the first line is a header and also the separator for the dataset. If the dataset was created successfully, the user will be taken to a screen to view the dataset grid that has just been imported. If there were any errors while importing the dataset, the user will be redirected to the previous screen. If not, the dataset is ready for consumption. Datasets created as an authenticated user will be saved indefinitely or until a delete request has been made.
To import a CSV dataset with R, use a command similar to this one. This is essentially the command used by Quadstat but without a graphical interface.
stent30 <- read.table("/home/ubuntu/stent30.tsv", header = TRUE, sep = "\t");
The dataset is being read into the stent30 variable. This terse syntax makes R quite powerful. The
\t tells R to use the TAB key as the separator between different values. The first parameter is the path to the TSV file (this value may be different depending on your operating system.) The header of a dataset is the first line of a dataset which contains the column names. In this case, we know the TSV file does have a header (We're using the stent30 and stent365 datasets of the OpenIntro Textbook Advanced High School Statistics.
Remember, when exiting to save the R environment otherwise the dataset will need to be re-imported.
The SAS code to import a dataset is considerably more complex. If you need help setting up SAS on your computer, you can read the previous blog post about downloading and installing SAS.
To start importing a dataset with SAS Studio, start at top left hand corner of the screen. You will see an asterisk icon dropdown. Choose "Import Data."
Next you will be presented to choose a file from your computer.
After selecting a file, you will be presented with a screen to configure the import. Remember: the file must be in the folder you configured with VirtualBox. SAS will not be able to see any files outside that directory. For most of the values, the default settings will suffice, but choosing a delimiter is important. The first dataset that The OpenIntro Textbook Advanced High School Statistics discusses is tab delimited. You will need to enter the following to symbolize the tab delimiter.
Here is the raw SAS code that will be generated based upon our selections. You can see the Tab machine code in the text below.
/* Generated Code (IMPORT) */
/* Source File: stent30.txt */
/* Source Path: /folders/myshortcuts/Quadstat */
/* Code generated on: 5/10/17, 11:17 AM */
FILENAME REFFILE '/folders/myshortcuts/Quadstat/stent30.txt';
PROC IMPORT DATAFILE=REFFILE
PROC CONTENTS DATA=WORK.IMPORT1; RUN;
If everything goes well, you will see a screen like this. You can scroll down to see the entire dataset.
From here you will be able to work with the data on future projects. Remember to save the state of the VirtualBox after successfully importing the dataset otherwise you will have to repeat this process.
From Around the Site...
|Title||Authored on||Content type|
|R Dataset / Package mosaicData / SaratogaHouses||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package hwde / IndianIrish||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package HSAUR / weightgain||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package Stat2Data / MathEnrollment||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package wooldridge / pntsprd||March 9, 2018 - 1:06 PM||Dataset|