Learning SQL Notes #6: Data Generation, Manipulation, and Conversion

Working with String Data

String Generation

Types:

CHAR Holds fixed-length, blank-padded strings.

varchar Holds variable-length strings.

text (MySQL and SQL Server) or clob (Oracle Database) Holds very large variable-length strings (generally referred to as documents in this context).

CREATE TABLE string_tbl 
(char_fld CHAR(30), 
vchar_fld VARCHAR(30), 
text_fld TEXT
);

INSERT INTO string_tbl (char_fld, vchar_fld, text_fld) 
VALUES ('This is char data',
'This is varchar data',
'This is text data');

If you want to have a longer string, you can

UPDATE string_tbl 
SET vchar_fld = 'This is a piece of extremely long varchar data';

but then:

ERROR 1406 (22001): Data too long for column 'vchar_fld' at row 1

NOTE: Since MySQL 6.0, the default behavior is now “strict” mode, which means that exceptions are thrown when problems arise, whereas in older versions of the server the string would have been truncated and a warning issued.

SELECT @@session.sql_mode;
SET sql_mode='ansi'; /*Go back to the older ver.*/
SELECT @@session.sql_mode;

Now extra will be truncated.

Including single quotes

SELECT quote(text_fld) 
FROM string_tbl;

Output:

QUOTE(text_fld)
‘This string didn't work, but it does now’

Including special characters

The SQL Server and MySQL servers include the built-in function char() so that you can build strings from any of the 255 characters in the ASCII character set (Oracle Database users can use the chr() function).

SELECT CHAR(128,129,130,131,132,133,134,135,136,137);

Output:

CHAR(128,129,130,131,132,133,134,135,136,137)
Çüéâäàåçêë

R codes:

coderange <- c(128,129,130,131,132,133,134,135,136,137)
rawToChar(as.raw(coderange),multiple=TRUE)

You can also concatenate two strings:

SELECT CONCAT('danke sch', CHAR(148), 'n');

Output:

CONCAT(‘danke sch’, CHAR(148), ‘n’)
danke schön

R codes:

paste('danke sch', rawToChar(as.raw(148)), 'n')
paste0()

See: https://www.r-bloggers.com/2011/03/ascii-code-table-in-r/

  • Oracle Database/PostgreSQL users can use the concatenation operator (||) instead of the concat() function, as in:
SELECT 'danke sch' || CHR(148) || 'n' FROM dual;
  • SQL Server does not include a concat() function, so you will need to use the concatenation operator (+), as in:
SELECT 'danke sch' + CHAR(148) + 'n'

String Manipulation

String functions that return numbers

To find the length of a string:

LENGTH()

SELECT LENGTH(char_fld) char_length, 
LENGTH(vchar_fld) varchar_length, 
LENGTH(text_fld) text_length
FROM string_tbl;

R codes:

length()

To find the index of a character in a string:

POSITION()

SELECT POSITION('characters' IN vchar_fld)
FROM string_tbl;

R codes:

match('y',x)
which('y' %in% x)

Note: When working with databases that the first character in a string is at position 1. A return value of 0 from instr() indicates that the substring could not be found, not that the substring was found at the first position in the string.

If you want to start your search at something other than the first character of your target string, you will need to use the locate() function, which is similar to the position() function except that it allows an optional third parameter, which is used to define the search’s start position. The locate() function is also proprietary, whereas the position() function is part of the SQL:2003 standard.

SELECT LOCATE('is', vchar_fld, 5)
FROM string_tbl;

R codes:

match('y',x[5:])
which('y' %in% x[5:])

Oracle Database instr(): Mimics the position() function when provided with two arguments and mimics the locate() function when provided with three arguments.

SQL Server charindx(): similar to Oracle’s instr() function.

strcmp() (MySQL ONLY) takes two strings as arguments and returns one of the following:

  • −1 if the first string comes before the second string in sort order
  • 0 if the strings are identical
  • 1 if the first string comes after the second string in sort order
SELECT vchar_fld
FROM string_tbl
ORDER BY vchar_fld;
vchar_fld
12345
abcd
QRSTUV
qrstuv
xyz
SELECT STRCMP('12345','12345') 12345_12345,
STRCMP('abcd','xyz') abcd_xyz, 
STRCMP('abcd','QRSTUV') abcd_QRSTUV, 
STRCMP('qrstuv','QRSTUV') qrstuv_QRSTUV, /*Case insensitive*/
STRCMP('12345','xyz') 12345_xyz,
STRCMP('xyz','qrstuv') xyz_qrstuv;
12345_12345abcd_xyzabcd_QRSTUVqrstuv_QRSTUV12345_xyzxyz_qrstuv
0−1−10−11

Add or replace characters in the middle of a string: insert() 4 parameters: the original string, the start position, the number of characters to replace (0 for inserting a string), and the replacement string.

SELECT INSERT('goodbye world', 9, 0, 'cruel ') string;
/*goodbye cruel world*/
SELECT INSERT('goodbye world', 1, 7, 'hello') string;
/*hello world*/
SELECT SUBSTRING('goodbye cruel world', 9, 5);
/*cruel*/

For other SQL,

/*Oracle*/
SELECT REPLACE('goodbye world', 'goodbye', 'hello') FROM dual;
/*hello world*/
SELECT substr('goodbye cruel world', 9, 5);
/*cruel*/

/*SQL Server*/
SELECT STUFF('hello world', 1, 5, 'goodbye cruel')
/*goodbye cruel world*/
SELECT SUBSTRING('goodbye cruel world', 9, 5);
/*cruel*/

Working with Numeric Data

SELECT (37 * 59) / (78 - (8 * 6));

Performing Arithmetic Functions & Controlling Number Precision & Handling Signed Data

Function nameDescription
acos( x )Calculates the arc cosine of x
asin( x )Calculates the arc sine of x
atan( x )Calculates the arc tangent of x
cos( x )Calculates the cosine of x
sin( x )Calculates the sine of x
tan( x )Calculates the tangent of x
cot( x )Calculates the cotangent of x
exp( x )Calculates ex
ln( x )Calculates the natural log of x
sqrt( x )Calculates the square root of x

Some useful functions in R and SQL (See Appendix for full results):

SQLR
MOD( x )%%
POW( x )^
CEIL( x )ceiling()
FLOOR( x )floor()
ROUND( x )round()
TRUNCATE( x )trunc()
SIGN( x )sign()
ABS( x )abs()

Working with Temporal Data

Dealing with Time Zones

/*MySQL*/
SELECT @@global.time_zone, @@session.time_zone;
SET time_zone = 'Europe/Zurich';

/*Oracle Database*/
ALTER SESSION TIMEZONE = 'Europe/Zurich'

From:

@@global.time_zone@@session.time_zone
SYSTEMSYSTEM

To:

@@global.time_zone@@session.time_zone
SYSTEMEurope/Zurich

R codes:

Sys.timezone()
Sys.setenv(TZ = "Europe/Zurich")

Generating Temporal Data

You can generate temporal data via any of the following means:

  • Copying data from an existing date, datetime, or time column
  • Executing a built-in function that returns a date, datetime, or time
  • Building a string representation of the temporal data to be evaluated by the server

String representations of temporal data

ComponentDefinitionRange
YYYYYear, including century1000 to 9999
MMMonth01 (January) to 12 (December)
DDDay01 to 31
HHHourRange 00 to 23
HHHHours−838 to 838
MI(elapsed) Minute00 to 59
SSSecond00 to 59
TypeDefault format
dateYYYY-MM-DD
datetimeYYYY-MM-DD HH:MI:SS
timestampYYYY-MM-DD HH:MI:SS
timeHHH:MI:SS

String-to-date conversions

  • A simple query that returns a datetime value using the cast() function
SQLR (lubridate)
CAST(‘2019-09-17 15:30:00’ AS DATETIME)as_datetime()
STR_TO_DATE(‘September 17, 2019’, ‘%M %d, %Y’)as.Date(…, format=…)
CAST(‘2019-09-17’ AS DATE)as.Date()
CAST(‘108:17:57’ AS TIME)as.POSIXlt()
/*MySQL*/
SELECT str_to_date();

/*Oracle Database*/
SELECT to_date();

/*SQL server*/
SELECT convert();

/*Current System Time*/
SELECT CURRENT_DATE(), CURRENT_TIME(), CURRENT_TIMESTAMP();

Common notations for both R and SQL:

Format componentDescription
%MMonth name (January to December)
%mMonth numeric (01 to 12)
%dDay numeric (01 to 31)
%jDay of year (001 to 366)
%WWeekday name (Sunday to Saturday)
%YYear, four-digit numeric
%yYear, two-digit numeric
%HHour (00 to 23)
%hHour (01 to 12)
%iMinutes (00 to 59)
%sSeconds (00 to 59)
%fMicroseconds (000000 to 999999)
%pA.M. or P.M.

Manipulating Temporal Data

Interval types for DATE_ADD() and EXTRACT()

Interval nameDescription
secondNumber of seconds
minuteNumber of minutes
hourNumber of hours
dayNumber of days
monthNumber of months
yearNumber of years
minute_secondNumber of minutes and seconds, separated by “:”
hour_secondNumber of hours, minutes, and seconds, separated by “:”
year_monthNumber of years and months, separated by “-”

Temporal functions that return dates

The same result can be performed on three different servers:

/*MySQL*/
UPDATE employee 
SET birth_date = DATE_ADD(birth_date, INTERVAL '9-11' YEAR_MONTH)
WHERE emp_id = 4789;

/*Oracle Database*/
UPDATE employee 
SET birth_date = ADD_MONTHS(birth_date, 119)
WHERE emp_id = 4789;

/*SQL server*/
UPDATE employee 
SET birth_date = DATEADD(MONTH, 119, birth_date)
WHERE emp_id = 4789

Temporal functions that return strings

Some other functions for temporal data:

/*MySQL*/
SELECT LAST_DAY('2019-09-17'); /*Extract last day of Sept*/
SELECT DAYNAME('2019-09-18'); /*Wednesday*/
SELECT EXTRACT(YEAR FROM '2019-09-18 22:19:05'); /*2019*/

/*SQL Server*/
SELECT DATEPART(YEAR, GETDATE())

Temporal functions that return numbers

SELECT DATEDIFF('2019-09-03', '2019-06-21');
/*74*/

SELECT DATEDIFF('2019-09-03 23:59:59', '2019-06-21 00:00:01');
/*74, time has no effects*/

SELECT DATEDIFF('2019-06-21', '2019-09-03');
/*-74*/

/*SQL Server*/
SELECT DATEDIFF(DAY, '2019-06-21', '2019-09-03')

Conversion Functions

SELECT CAST('1456328' AS SIGNED INTEGER);
/*1456328*/

SELECT CAST('999ABC111' AS UNSIGNED INTEGER);
/*999 with warnings about truncation*/

Appendix for Codes

SELECT MOD(10,4);
/*2*/
SELECT MOD(20.75,4); /*Real argument*/
/*0.75*/
SELECT POW(2,8);
/*256*/

SELECT CEIL(72.445), FLOOR(72.445);
/*73  72*/
SELECT CEIL(72.000000001), FLOOR(72.999999999);
/*73  72*/

SELECT ROUND(72.49999), ROUND(72.5), ROUND(72.50001);
/*72  73  73*/

SELECT ROUND(72.0909, 1), ROUND(72.0909, 2), ROUND(72.0909, 3);
/*72.1  72.09  72.091*/
SELECT TRUNCATE(72.0909, 1), TRUNCATE(72.0909, 2), TRUNCATE(72.0909, 3);
/*72.0  72.09  72.090*/

/*SQL Server*/
SELECT ROUND(72.0909, 1, 1)

R codes:

%%
^
ceiling()
floor()
round()
trunc()
SELECT account_id, SIGN(balance), ABS(balance) 
FROM account;

R codes:

sign()
abs()

Hope I can finish this before July. Stay safe.

Related