Load Data CSV into MySQL
For this blog, I will import a CSV file to a MySQL server(using MAMP) to create a practice platform for SQL statements. More specifically, I will import the titanic dataset(train.csv), which I can find here (https://www.kaggle.com/c/titanic). This tutorial will be broken down into three steps:
- Installing MAMP
Installing and editing configuration parameters
- Creating Database & Table
Creating Database and a table of titanic meta-data
- Load Data Infile
Populating the table with the data
I can download the MAMP software from here (https://www.mamp.info/en/downloads/). After running the installation, the setup might allow you to install MAMP Pro; however, I opted not to install it for this tutorial. When selecting the ‘Destination Location,’ I left it at the default directory, C:\MAMP.
Upon opening the new software, click on ‘Start Servers’ and then ‘Open WebStart Page,’ which will open your browser to a locally hosted homepage.
If phpMyAdmin directs you to a MySQL database page, you have successfully installed MAMP. I can find the phpMyAdmin under the Tools tab.
By default, secure_file_priv is set to NULL and will return an error when importing a CSV from a local directory. To troubleshoot this, open the my.ini file under the MAMP folder and write [secure-file-priv = “”]. For Windows users, I can find the my.ini file here:
Open the file, and write secure-file-priv = “” under [mysqld].
password = ...
port = ...
socket = ...
secure-file-priv = ""
Doing this now will avoid errors when I ‘Load Data Local Infile.’
Creating Database & Table
Open the phpMyAdmin page from the MAMP local homepage of which should open the locally hosted MySQL server. Create a database by clicking on the ‘Databases’ tab, inserting a name (titanic_tutorial), and Creating.
Upon database creation, it should have made a folder in the database>mysql directory named ‘titanic_tutorial.’ For Windows users, I can find the folder here: C:\MAMP\db\mysql (the base dir is found in the my.ini file from before). Expand into the titanic_tutorial folder and save the titanic’s train.csv into the folder.
Return back to phpMyAdmin to create a table of headers using the titanic’s train.csv meta-data.
CREATE TABLE train (
PassengerId INT NOT NULL,
Survived INT DEFAULT NULL,
Pclass INT DEFAULT NULL,
Name VARCHAR(128) DEFAULT NULL,
Sex VARCHAR(64) DEFAULT NULL,
Age FLOAT DEFAULT NULL,
SibSp INT DEFAULT NULL,
Parch INT DEFAULT NULL,
Ticket VARCHAR(64) DEFAULT NULL,
Fare FLOAT DEFAULT NULL,
Cabin VARCHAR(64) DEFAULT NULL,
Embarked VARCHAR(64) DEFAULT NULL
Load Data Local Infile
Lastly, I will populate the table ‘train’ using the train.csv that was saved in the C:\MAMP\db\mysql\titanic_tutorial folder. Go back to the SQL tab, and copy/paste the following SQL code.
LOAD DATA LOCAL INFILE ‘train.csv’
INTO TABLE train
FIELDS TERMINATED BY ‘,’
ENCLOSED BY ‘“‘
LINES TERMINATED BY ‘\n’
IGNORE 1 ROWS
(PassengerId, @vSurvived, @vPclass, @vName, @vSex, @vAge, @vSibSp, @vParch, @vTicket, @vFare, @vCabin, @vEmbarked)
Survived = NULLIF(@vSurvived, ‘’),
Pclass = NULLIF(@vPclass, ‘’),
Name = NULLIF(@vName, ‘’),
Sex = NULLIF(@vSex, ‘’),
Age = NULLIF(@vAge, ‘’),
SibSp = NULLIF(@vSibSp, ‘’),
Parch = NULLIF(@vParch, ‘’),
Ticket = NULLIF(@vTicket, ‘’),
Fare = NULLIF(@vFare, ‘’),
Cabin = NULLIF(@vCabin, ‘’),
Embarked = NULLIF(@vEmbarked, ‘’)
The first row of the dataset is the header/column names, and therefore I need to skip the first row. In addition, the train.csv dataset contains some missing values, and I will need to set NULL if the values are blank. Store the column names with ‘@v’ as a prefix for columns that can contain NULLs. Then SET the columns with a NULLIF condition as shown above. I need to add the prefix here to prevent the value 0 from being translated to NULL. Value 0 should be 0, and Value ‘’ should be NULL.