Getting CSV File Data into SQL Azure

I have been using a trial to see if SQL Azure might be a good place to store regular data extractions for some of my auto dealership clients. I have always wanted to develop using SQL Server because of the (it seems) universal connectivity ability. Nearly all the web development frameworks I use make connection to SQL Server.

So I decided to give it a try and after building a vehiclesalescurrent database as follows:

I then researched the proper way to load the table. BULK INSERT is not supported in SQL Azure; only the Bulk Copy Program (bcp Utility) is available to import csv files. The bcp Utiltiy, being a command line program is not very intuitive. It is rather powerful though as it can do import (in), export (out) and create format files. Whatever possible switch you can think of, it has. I forged ahead and tried to do the import without a format file (which is possible apparently). The Format File is an XML file that tells bcp how the csv file maps to the table in the database.  I received error after error mostly relating to cast issues and  invalid end of record. I was under the impression that a csv file had a rather common end of record code known as CRLF or carraige return/line feed. I opened my csv file in notepad++ with the view codes option on to make sure there wasn’t anything unusual going on. There wasn’t.

Having no success the easy way I decided to create a Format file which would tell bcp what to look for definitively as an end of record. The bcp utility will accept either an XML or non-XML Format file and bcp utility will create either for you. I chose the XML format file because I just felt it might be less buggy. This spit out a file easily but I still had to make a few modifications to the resulting XML file. In particular, bcp got the column separator wrong (“\t” for tab) and I changed it to a comma (“\,” ) as the file was csv. Also the last column of data, in my case column 21 needed the terminator “\r\n” which is the offending return and newline (CRLF) codes! Make sure the slashes are the right way; for some reason (I think I saw it in a blog post!) I put forward slashes and I needed the help desk to straighten me out. Anyway here is the bcp command to create an XML format file:
bcp MyDB.dbo.VehicleSalesCurrent format nul -c -x -f C:\JohnDonnelly\VehicleSalesCurrent.xml -U johndonnelly@xxxxxxxx -S tcp:xxxxxxx.database.windows.net -P mypassword

And here is the final correct Format file that you would  include in the same directory as the vehiclesalescurrent.txt data file when running the bcp utility to upload the data to SQL Azure:

<?xml version=”1.0″?>

<BCPFORMAT xmlns=”http://schemas.microsoft.com/sqlserver/2004/bulkload/format&#8221; xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”&gt;

<RECORD>

<FIELD ID=”1″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”12″/>

<FIELD ID=”2″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”24″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”3″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”24″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”4″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”24″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”5″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”16″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”6″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”11″/>

<FIELD ID=”7″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”16″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”8″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”100″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”9″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”100″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”10″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”100″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”11″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”12″/>

<FIELD ID=”12″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”11″/>

<FIELD ID=”13″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”12″/>

<FIELD ID=”14″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”30″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”15″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”100″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”16″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”100″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”17″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”40″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”18″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”12″/>

<FIELD ID=”19″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”34″ COLLATION=”SQL_Latin1_General_CP1_CI_AS”/>

<FIELD ID=”20″ xsi:type=”CharTerm” TERMINATOR=”,” MAX_LENGTH=”30″/>

<FIELD ID=”21″ xsi:type=”CharTerm” TERMINATOR=”\r\n” MAX_LENGTH=”12″/>

</RECORD>

<ROW>

<COLUMN SOURCE=”1″ NAME=”ID” xsi:type=”SQLINT”/>

<COLUMN SOURCE=”2″ NAME=”StockNo” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”3″ NAME=”DealType” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”4″ NAME=”SaleType” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”5″ NAME=”Branch” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”6″ NAME=”SalesDate” xsi:type=”SQLDATE”/>

<COLUMN SOURCE=”7″ NAME=”CustNo” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”8″ NAME=”Name” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”9″ NAME=”Address” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”10″ NAME=”City” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”11″ NAME=”Zip” xsi:type=”SQLINT”/>

<COLUMN SOURCE=”12″ NAME=”BirthDate” xsi:type=”SQLDATE”/>

<COLUMN SOURCE=”13″ NAME=”Year” xsi:type=”SQLINT”/>

<COLUMN SOURCE=”14″ NAME=”Make” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”15″ NAME=”Model” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”16″ NAME=”Body” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”17″ NAME=”Color” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”18″ NAME=”Mileage” xsi:type=”SQLINT”/>

<COLUMN SOURCE=”19″ NAME=”VIN” xsi:type=”SQLNVARCHAR”/>

<COLUMN SOURCE=”20″ NAME=”Down” xsi:type=”SQLMONEY”/>

<COLUMN SOURCE=”21″ NAME=”Term” xsi:type=”SQLINT”/>

</ROW>

</BCPFORMAT>

As you can see there is a section that defines what the bcp utility should expect regarding each of the fields and a section that defines what column name goes where in the receiving table.

As I mentioned above the Field Terminators are important (of course) and I found that bcp had them incorrect (“\t”) for a CSV file. Also it had the field terminator for the last field the same as all of the others fields (which was “\t”). This needs to be set to the CR LF codes. Also as I mentioned above the terminator on my last column (21) needed to have back slashes, and I think my google search yielded bad advice and I had forward slashes so:

<FIELD ID=”21″ xsi:type=”CharTerm” TERMINATOR=”/r/n” MAX_LENGTH=”12″/>

Obviously it should be:

<FIELD ID=”21″ xsi:type=”CharTerm” TERMINATOR=”\r\n” MAX_LENGTH=”12″/>

With the good Format FIle I ran the following bcp utility statement to handle the import:
# This one works, starts importing at Row 2, make sure dates are YYYY-MM-DD
bcp MyDB.dbo.VehicleSalesCurrent in C:\JohnDonnelly\VehicleSalesCurrent.csv -F 2 -f C:\JohnDonnelly\VehicleSalesCurrent.xml -U johndonnelly@xxxxxxx -S tcp:xxxxxxxx.database.windows.net -P mypassword -e C:\JohnDonnelly\err.txt

The -e switch throws out a nice err.txt file that gives you much more information about errors than the console does. For complete switch explanation follow the bcp Utility link above.

With the above Format File things got a little better however I had to open a case because I kept getting an “Invalid character value for cast specification” error which cited a SQLDate column type.

I learned from the help desk that the SQLDATE columns in my csv file needed to look like yyyy-mm-dd so I had to use a custom format each time if I opened the csv in Excel. You have to reformat the date columns as yyyy-mm-dd then save the file. That is the quick way anyway.

The support person said there was a bug in bcp relating to dates which leads to the last item and that is out of 9,665 rows there were 34 rows that wouldn’t move due to this error:

“Col 12 – Invalid character value for cast specification”

Column 12 was a birth date of SQLDATE type.

Furthermore 32 of the 34 rows were null in Column 12 so I agree with support that bcp Utility or SQL Azure is buggy with regards to SQLDATE.

I hope this helps someone battling the bcp utility with SQL Azure!