Tuesday, 25 December 2012

COPY: How to skip first column that has a different delimiter

I had this very interesting problem sometime back wherein customer had a data file with first column delimiter different from rest of the columns. Customer wanted to load this file using COPY command and wanted to skip first column.

The data file looked like this
The first column of the data file has ~ as a field delimeter
The rest of the columns have pipe (|) as a filed delimeter

$ cat load.out
SkipMe1 ~ A1 | B1 | C1 
SkipMe2 ~ A2 | B2 | C2 

The table definition is below

CREATE TABLE t ( 
AColumn VARCHAR(10), 
BColumn VARCHAR(10), 
CColumn VARCHAR(10) 
); 
CREATE PROJECTION tp ( 
AColumn, 
BColumn, 
CColumn 

AS SELECT * from t; 

The data once loaded was supposed to be as shown below:

AColumn | BColumn | CColumn 
------+-----------+------- 
A1 | B1| C1 
A2 | B2| C2


Skipping a column in COPY command is easy using "Filler", and so is specifying column delimiter. But I had never tried before if we can specify delimiter for a specific column and that too in conjunction with Filler option. So I tested the solution out and voila...

The COPY command I used is shown below. Note the use of "FILLER" and specific use of "DELIMITER" only for first column.

$ cat load.out | vsql -c "copy t(c1 FILLER varchar(10) delimiter '~',c2 FILLER varchar(10), c3 FILLER varchar(10),c4 FILLER varchar(10), AColumn as c2, BColumn as c3, CColumn as c4) from stdin direct"; 


Sunday, 23 December 2012

Changing row delimiter in VSQL


Consider a scenario when you export some data with VSQL and some of the string columns have new line characters as part of column value. Now if you want to load the data file to some other table using COPY statement, or may be parsing the data using some of your custom parser, the new line characters in string column are bound to pose problem.

One of the easiest and cleanest way to tackle this problem is to change VSQL row delimiter to some character other than new line. So that when you export data, the rows will be delimited by the character of your choice and hence your COPY command or your custom parser can tell new line characters in column value to row delimiter.


In the example below, record separator is changed from default \n to '~'. The fields are separated by '|' and records are separated by '~"


At command line:

Use -R option (along with -A for unaligned output) to specify field separator.

$ vsql -o my_table.out -c "select * from my_table" -A -R '~';
$ cat my_table.out
id|status~1001|t~1002|f~1003|t~1004|f~(4 rows)


At vsql prompt:

Use \pset to specify new record separator
vsql=> \pset recordsep '~' 
Record separator is "~". 

Set output format as unaligned
vsql=> \a 
Output format is unaligned. 

vsql=> \o my_table2.out
vsql=> select * from my_table;
vsql=> \o