<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>SQL | Siqi Zheng</title><link>https://siqi-zheng.rbind.io/category/sql/</link><atom:link href="https://siqi-zheng.rbind.io/category/sql/index.xml" rel="self" type="application/rss+xml"/><description>SQL</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Fri, 11 Jun 2021 15:00:00 +0000</lastBuildDate><image><url>https://siqi-zheng.rbind.io/images/icon_hu1f65844ca26c0df97a9719a407d829c0_98767_512x512_fill_lanczos_center_2.png</url><title>SQL</title><link>https://siqi-zheng.rbind.io/category/sql/</link></image><item><title>Learning SQL Notes #16: SQL and Big Data</title><link>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-16/</link><pubDate>Fri, 11 Jun 2021 15:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-16/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#introduction-to-apache-drill">Introduction to Apache Drill&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#querying-files-using-drill">Querying Files Using Drill&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#querying-mysql-using-drill">Querying MySQL Using Drill&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#querying-mongodb-using-drill">Querying MongoDB Using Drill&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#drill-with-multiple-data-sources">Drill with Multiple Data Sources&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#future-of-sql">Future of SQL&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The data landscape has changed quite a bit over the past decade, and SQL is changing to meet the needs of today’s rapidly evolving environments. Many organizations that had used relational databases exclusively just a few years ago are now also housing data in Hadoop clusters, data lakes, and NoSQL databases. At the same time, companies are struggling to find ways to gain insights from the ever-growing volumes of data, and the fact that this data is now spread across multiple data stores, perhaps both on-site and in the cloud, makes this a daunting task.&lt;/p>
&lt;p>Because SQL is used by millions of people and has been integrated into thousands of applications, it makes sense to leverage SQL to harness this data and make it actionable. Over the past several years, a new breed of tools has emerged to enable SQL access to structured, semi-structured, and unstructured data: tools such as Presto, Apache Drill, and Toad Data Point. This chapter explores one of these tools, Apache Drill, to demonstrate how data in different formats and stored on different servers can be brought together for reporting and analysis.&lt;/p>
&lt;h1 id="introduction-to-apache-drill">Introduction to Apache Drill&lt;/h1>
&lt;p>Compelling features:&lt;/p>
&lt;ul>
&lt;li>Facilitates queries across multiple data formats, including delimited data, JSON, Parquet, and log files&lt;/li>
&lt;li>Connects to relational databases, Hadoop, NoSQL, HBase, and Kafka, as well as specialized data formats such as PCAP, BlockChain, and others&lt;/li>
&lt;li>Allows creation of custom plug-ins to connect to most any other data store&lt;/li>
&lt;li>Requires no up-front schema definitions&lt;/li>
&lt;li>Supports the SQL:2003 standard&lt;/li>
&lt;li>Works with popular business intelligence (BI) tools like Tableau and Apache Superset
Using Drill, you can connect to any number of data sources and begin querying, without the need to first set up a metadata repository.&lt;/li>
&lt;/ul>
&lt;h1 id="querying-files-using-drill">Querying Files Using Drill&lt;/h1>
&lt;p>Let’s start by using Drill to query data in a file. Drill understands how to read several different file formats, including packet capture (PCAP) files, which are in binary for‐ mat and contain information about packets traveling over a network. All I have to do when I want to query a PCAP file is to configure Drill’s dfs (distributed filesystem) plug-in to include the path to the directory containing my files, and I’m ready to write queries.&lt;/p>
&lt;p>Drill includes partial support for information_schema, so you can find out high-level information about the data files in your workspace:&lt;/p>
&lt;pre>
SELECT file_name, is_directory, is_file, permission
FROM &lt;b>information_schema.`files`&lt;/b>
WHERE schema_name = 'dfs.data';
SELECT * FROM dfs.data.`attack-trace.pcap`
&lt;b>WHERE 1=2;&lt;/b> # To see the column name
&lt;/pre>
&lt;p>Counts the number of packets sent from each IP address to each destination port:&lt;/p>
&lt;pre>
SELECT src_ip, dst_port,
count(*) AS packet_count
FROM dfs.data.`attack-trace.pcap`
GROUP BY src_ip, dst_port;
&lt;/pre>
&lt;p>Aggregates packet information for each second:&lt;/p>
&lt;pre>
SELECT trunc(extract(second from `timestamp`)) as packet_time,
count(*) AS num_packets,
sum(packet_length) AS tot_volume
FROM dfs.data.`attack-trace.pcap`
GROUP BY trunc(extract(second from `timestamp`));
&lt;/pre>
&lt;p>Put backticks (`) around timestamp because it is a reserved word.&lt;/p>
&lt;p>You can query files stored locally, on your network, in a distributed filesystem, or in the cloud. Drill has built-in support for many file types, but you can also build your own plug-in to allow Drill to query any type of file.&lt;/p>
&lt;h1 id="querying-mysql-using-drill">Querying MySQL Using Drill&lt;/h1>
&lt;p>Why Apache Drill? Because you can write queries using Drill that combine data from different sources, so you might write a query that joins data from MySQL, Hadoop, and comma-delimited files, for example.&lt;/p>
&lt;p>The first step is to choose a database:&lt;/p>
&lt;pre>
apache drill (information_schema)> &lt;b>use mysql.sakila&lt;/b>;
&lt;b>show tables;&lt;/b>
&lt;/pre>
&lt;p>Simple joins, group by, order and having work for Drill as well. However, Drill works with many relational databases, not just MySQL, so some features of the language may differ (e.g., data conversion functions). For more information, read
&lt;a href="http://drill.apache.org/docs/sql-reference/" target="_blank" rel="noopener">Drill’s documentation about their SQL implementation&lt;/a>.&lt;/p>
&lt;h1 id="querying-mongodb-using-drill">Querying MongoDB Using Drill&lt;/h1>
&lt;p>After using Drill to query the sample Sakila data in MySQL, the next logical step is to convert the Sakila data to another commonly used format, store it in a nonrelational database, and use Drill to query the data. I decided to convert the data to JSON and store it in MongoDB, which is one of the more popular NoSQL platforms for document storage. Drill includes a plug-in for MongoDB and also understands how to read JSON documents, so it was relatively easy to load the JSON files into Mongo and begin writing queries.&lt;/p>
&lt;p>After the JSON files have been loaded, the Mongo database contains two collections (films and customers), and the data in these collections spans nine different tables from the MySQL Sakila database.&lt;/p>
&lt;p>Group the data by rating and actor:&lt;/p>
&lt;pre>
SELECT g_pg_films.Rating,
g_pg_films.actor_list.`First name` first_name,
g_pg_films.actor_list.`Last name` last_name,
count(*) num_films
FROM
(SELECT f.Rating, flatten(Actors) actor_list
FROM films f
WHERE f.Rating IN ('G','PG')
) g_pg_films
GROUP BY g_pg_films.Rating,
g_pg_films.actor_list.`First name`,
g_pg_films.actor_list.`Last name`
HAVING count(*) > 9;
&lt;/pre>
&lt;p>The query should return all customers who have spent more than $80 to rent films rated either G or PG.&lt;/p>
&lt;pre>
SELECT first_name, last_name,
sum(cast(cust_payments.payment_data.Amount
as decimal(4,2))) tot_payments
FROM
(SELECT cust_data.first_name,
cust_data.last_name,
f.Rating,
flatten(cust_data.rental_data.Payments)
payment_data
FROM films f
INNER JOIN
(SELECT c.`First Name` first_name,
c.`Last Name` last_name, flatten(c.Rentals) rental_data
FROM customers c
) cust_data
ON f._id = cust_data.rental_data.filmID
WHERE f.Rating IN ('G','PG')
) cust_payments
GROUP BY first_name, last_name
HAVING
sum(cast(cust_payments.payment_data.Amount as decimal(4,2))) > 80;
&lt;/pre>
&lt;p>The innermost query, which I named cust_data, flattens the Rentals list so that the cust_payments query can join to the films collection and also flatten the Payments list. The outermost query groups the data by customer name and applies a having clause to filter out customers who spent $80 or less on films rated G or PG.&lt;/p>
&lt;h1 id="drill-with-multiple-data-sources">Drill with Multiple Data Sources&lt;/h1>
&lt;p>As long as Drill is configured to connect to both databases, you just need to describe where to find the data.&lt;/p>
&lt;pre>
&lt;b>FROM mysql.sakila.film f&lt;/b>
&lt;b>FROM mongo.sakila.customers c&lt;/b>
&lt;/pre>
&lt;h1 id="future-of-sql">Future of SQL&lt;/h1>
&lt;p>The future of relational databases is somewhat unclear. It is possible that the big data technologies of the past decade will continue to mature and gain market share. It’s also possible that a new set of technologies will emerge, overtaking Hadoop and NoSQL, and taking additional market share from relational databases. However, most companies still run their core business functions using relational databases, and it should take a long time for this to change.&lt;/p>
&lt;p>The future of SQL seems a bit clearer, however. While the SQL language started out as a mechanism for interacting with data in relational databases, tools like Apache Drill act more like an abstraction layer, facilitating the analysis of data across various database platforms. In this author’s opinion, this trend will continue, and SQL will remain a critical tool for data analysis and reporting for many years.&lt;/p></description></item><item><title>Learning SQL Notes #15: Working with Large Databases</title><link>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-15/</link><pubDate>Fri, 11 Jun 2021 09:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-15/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#partitioning">Partitioning&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#partitioning-concepts">Partitioning Concepts&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#table-partitioning">Table Partitioning&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#index-partitioning">Index Partitioning&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#partitioning-methods">Partitioning Methods&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#range-partitioning">Range partitioning&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#list-partitioning">List partitioning&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#hash-partitioning">Hash partitioning&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#composite-partitioning">Composite partitioning&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#partitioning-benefits">Partitioning Benefits&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#clustering">Clustering&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#sharding">Sharding&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#big-data">Big Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#hadoop">Hadoop&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#nosql-and-document-databases">NoSQL and Document Databases&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#cloud-computing">Cloud Computing&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#conclusion">Conclusion&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>While relational databases face various challenges as data volumes continue to grow, there are strategies such as partitioning, clustering, and sharding that allow companies to continue to utilize relational databases by spreading data across multi‐ ple storage tiers and servers. Other companies have decided to move to big data platforms such as Hadoop in order to handle huge data volumes.&lt;/p>
&lt;h1 id="partitioning">Partitioning&lt;/h1>
&lt;p>The following tasks become more difficult and/or time consuming as a table grows past a few million rows:&lt;/p>
&lt;ul>
&lt;li>Query execution requiring full table scans&lt;/li>
&lt;li>Index creation/rebuild&lt;/li>
&lt;li>Data archival/deletion&lt;/li>
&lt;li>Generation of table/index statistics&lt;/li>
&lt;li>Table relocation (e.g., move to a different tablespace)&lt;/li>
&lt;li>Database backups&lt;/li>
&lt;/ul>
&lt;p>The best way to prevent administrative issues from occurring in the future is to break large tables into pieces, or &lt;em>partitions&lt;/em>, when the table is first created (although tables can be partitioned later, it is easier to do so initially). Administrative tasks can be performed on individual partitions, often in parallel, and some tasks can skip one or more partitions entirely.&lt;/p>
&lt;h2 id="partitioning-concepts">Partitioning Concepts&lt;/h2>
&lt;p>While every partition must have the same schema definition (columns, column types, etc.), there are several administrative features that can differ for each partition:&lt;/p>
&lt;ul>
&lt;li>Partitions may be stored on different tablespaces, which can be on different physical storage tiers.&lt;/li>
&lt;li>Partitions can be compressed using different compression schemes.&lt;/li>
&lt;li>Local indexes (more on this shortly) can be dropped for some partitions.&lt;/li>
&lt;li>Table statistics can be frozen on some partitions, while being periodically refreshed on others.&lt;/li>
&lt;li>Individual partitions can be pinned into memory or stored in the database’s flash storage tier.&lt;/li>
&lt;/ul>
&lt;h2 id="table-partitioning">Table Partitioning&lt;/h2>
&lt;p>The partitioning scheme available in most relational databases is &lt;em>horizontal partitioning&lt;/em>, which assigns entire rows to exactly one partition. Tables may also be partitioned &lt;em>vertically&lt;/em>, which involves assigning sets of columns to different partitions, but this must be done manually. When partitioning a table horizontally, you must choose a &lt;em>partition key&lt;/em>, which is the column whose values are used to assign a row to a particular partition. In most cases, a table’s partition key consists of a single column, and a &lt;em>partitioning function&lt;/em> is applied to this column to determine in which partition each row should reside.&lt;/p>
&lt;h2 id="index-partitioning">Index Partitioning&lt;/h2>
&lt;p>If your partitioned table has indexes, you will get to choose whether a particular index should stay intact, known as a &lt;em>global index&lt;/em>, or be broken into pieces such that each partition has its own index, which is called a &lt;em>local index&lt;/em>. Global indexes span all partitions of the table and are useful for queries that do not specify a value for the partition key.&lt;/p>
&lt;h2 id="partitioning-methods">Partitioning Methods&lt;/h2>
&lt;h3 id="range-partitioning">Range partitioning&lt;/h3>
&lt;p>The most common usage is to break up tables by date ranges.&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE TABLE sales
(sale_id INT NOT NULL,
cust_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
amount DECIMAL(9,2)
)
PARTITION BY RANGE (yearweek(sale_date))
(PARTITION s1 VALUES LESS THAN (202002),
PARTITION s2 VALUES LESS THAN (202003),
PARTITION s3 VALUES LESS THAN (202004),
PARTITION s4 VALUES LESS THAN (202005),
PARTITION s5 VALUES LESS THAN (202006),
PARTITION s999 VALUES LESS THAN (MAXVALUE)
);
&lt;/code>&lt;/pre>
&lt;p>Read and modify partitions:&lt;/p>
&lt;pre>
SELECT partition_name, partition_method, partition_expression
&lt;b>FROM information_schema.partitions &lt;/b>
WHERE table_name = 'sales'
ORDER BY partition_ordinal_position;
ALTER TABLE sales &lt;b>REORGANIZE PARTITION&lt;/b> s999 INTO
(PARTITION s6 VALUES LESS THAN (202007),
PARTITION s7 VALUES LESS THAN (202008),
PARTITION s999 VALUES LESS THAN (MAXVALUE)
);
&lt;/pre>
&lt;h3 id="list-partitioning">List partitioning&lt;/h3>
&lt;pre>&lt;code class="language-sql">PARTITION BY LIST COLUMNS (geo_region_cd)
(PARTITION ASIA VALUES IN ('CHN','JPN','IND'))
ALTER TABLE sales REORGANIZE PARTITION ASIA INTO
(PARTITION ASIA VALUES IN ('CHN','JPN','IND', 'KOR'));
&lt;/code>&lt;/pre>
&lt;h3 id="hash-partitioning">Hash partitioning&lt;/h3>
&lt;p>The server does this by applying a &lt;em>hashing function&lt;/em> to the column value.&lt;/p>
&lt;pre>&lt;code class="language-sql">PARTITION BY HASH (cust_id)
PARTITIONS 4
(PARTITION H1,
PARTITION H2,
PARTITION H3,
PARTITION H4
);
&lt;/code>&lt;/pre>
&lt;h3 id="composite-partitioning">Composite partitioning&lt;/h3>
&lt;p>If you need finer-grained control of how data is allocated to your partitions, you can employ &lt;em>composite partitioning&lt;/em>, which allows you to use two different types of partitioning for the same table. With composite partitioning, the first partitioning method defines the partitions, and the second partitioning method defines the &lt;em>subpartitions&lt;/em>.&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE TABLE sales
(sale_id INT NOT NULL,
cust_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
amount DECIMAL(9,2)
)
PARTITION BY RANGE (yearweek(sale_date))
SUBPARTITION BY HASH (cust_id)
(PARTITION s1 VALUES LESS THAN (202002)
(SUBPARTITION s1_h1, SUBPARTITION s1_h2, SUBPARTITION s1_h3, SUBPARTITION s1_h4),
PARTITION s2 VALUES LESS THAN (202003)
(SUBPARTITION s2_h1, SUBPARTITION s2_h2, SUBPARTITION s2_h3, SUBPARTITION s2_h4),
PARTITION s3 VALUES LESS THAN (202004)
(SUBPARTITION s3_h1, SUBPARTITION s3_h2,
SUBPARTITION s3_h3,
SUBPARTITION s3_h4),
PARTITION s4 VALUES LESS THAN (202005)
(SUBPARTITION s4_h1, SUBPARTITION s4_h2, SUBPARTITION s4_h3, SUBPARTITION s4_h4),
PARTITION s5 VALUES LESS THAN (202006)
(SUBPARTITION s5_h1, SUBPARTITION s5_h2, SUBPARTITION s5_h3, SUBPARTITION s5_h4),
PARTITION s999 VALUES LESS THAN (MAXVALUE)
(SUBPARTITION s999_h1, SUBPARTITION s999_h2, SUBPARTITION s999_h3,
SUBPARTITION s999_h4)
);
SELECT *
FROM sales PARTITION (s3);
SELECT *
FROM sales PARTITION (s3_h3);
&lt;/code>&lt;/pre>
&lt;h2 id="partitioning-benefits">Partitioning Benefits&lt;/h2>
&lt;p>One major advantage to partitioning is that you may only need to interact with as few as one partition, rather than the entire table.&lt;/p>
&lt;p>If you execute a query that includes a join to a partitioned table and the query includes a condition on the partitioning column, the server can exclude any partitions that do not contain data pertinent to the query. This is known as &lt;em>partitionwise joins&lt;/em>, and it is similar to partition pruning in that only those partitions that contain data needed by the query will be included.&lt;/p>
&lt;p>From an administrative standpoint, one of the main benefits to partitioning is the ability to quickly delete data that is no longer needed.&lt;/p>
&lt;p>Another administrative advantage to partitioned tables is the ability to perform updates on multiple partitions simultaneously, which can greatly reduce the time needed to touch every row in a table.&lt;/p>
&lt;h1 id="clustering">Clustering&lt;/h1>
&lt;p>&lt;em>Clustering&lt;/em> allows multiple servers to act as a single database.&lt;/p>
&lt;p>Shared-disk/shared-cache configurations: every server in the cluster has access to all disks, and data cached in one server can be accessed by any other server in the cluster. With this type of architecture, an application server could attach to any one of the database servers in the cluster, with connections automatically failing over to another server in the cluster in case of failure.&lt;/p>
&lt;p>Of the commercial database vendors, Oracle is the leader in this space, with many of the world’s biggest companies using the Oracle Exadata platform to host extremely large databases accessed by thousands of concurrent users. However, even this plat‐ form fails to meet the needs of the biggest companies, which led Google, Facebook, Amazon, and other companies to blaze new trails.&lt;/p>
&lt;h1 id="sharding">Sharding&lt;/h1>
&lt;p>&lt;em>Sharding&lt;/em> partitions the data across multiple databases (called &lt;em>shards&lt;/em>), so it is similar to table partitioning but on a larger scale and with far more complexity. If you were to employ this strategy for the social media company, you might decide to implement 100 separate databases, each one hosting the data for approximately 10 million users.&lt;/p>
&lt;ul>
&lt;li>You will need to choose a &lt;em>sharding key&lt;/em>, which is the value used to determine to which database to connect.&lt;/li>
&lt;li>While large tables will be divided into pieces, with individual rows assigned to a single shard, smaller reference tables may need to be replicated to all shards, and a strategy needs to be defined for how reference data can be modified and changes propagated to all shards.&lt;/li>
&lt;li>If individual shards become too large (e.g., the social media company now has two billion users), you will need a plan for adding more shards and redistributing data across the shards.&lt;/li>
&lt;li>When you need to make schema changes, you will need to have a strategy for deploying the changes across all of the shards so that all schemas stay in sync.&lt;/li>
&lt;li>If application logic needs to access data stored in two or more shards, you need to have a strategy for how to query across multiple databases and also how to implement transactions across multiple databases.&lt;/li>
&lt;/ul>
&lt;h1 id="big-data">Big Data&lt;/h1>
&lt;p>One way to define the boundaries of big data is with the “3 Vs”:&lt;/p>
&lt;p>&lt;em>Volume&lt;/em>&lt;/p>
&lt;p>In this context, volume generally means billions or trillions of data points.&lt;/p>
&lt;p>&lt;em>Velocity&lt;/em>&lt;/p>
&lt;p>This is a measure of how quickly data arrives.&lt;/p>
&lt;p>&lt;em>Variety&lt;/em>&lt;/p>
&lt;p>This means that data is not always structured (as in rows and columns in a rela‐ tional database) but can also be unstructured (e.g., emails, videos, photos, audio files, etc.).&lt;/p>
&lt;p>So, one way to characterize big data is any system designed to handle a huge amount of data of various formats arriving at a rapid pace.&lt;/p>
&lt;h2 id="hadoop">Hadoop&lt;/h2>
&lt;p>Hadoop is best described as an &lt;em>ecosystem&lt;/em>, or a set of technologies and tools that work together. Some of the major components of Hadoop include:&lt;/p>
&lt;p>&lt;em>Hadoop Distributed File System (HDFS)&lt;/em>&lt;/p>
&lt;p>Like the name implies, HDFS enables file management across a large number of servers.&lt;/p>
&lt;p>&lt;em>MapReduce&lt;/em>&lt;/p>
&lt;p>This technology processes large amounts of structured and unstructured data by breaking a task into many small pieces that can be run in parallel across many servers.&lt;/p>
&lt;p>&lt;em>YARN&lt;/em>&lt;/p>
&lt;p>This is a resource manager and job scheduler for HDFS.&lt;/p>
&lt;p>Together, these technologies allow for the storage and processing of files across hun‐ dreds or even thousands of servers acting as a single logical system. While Hadoop is widely used, querying the data using MapReduce generally requires a programmer, which has led to the development of several SQL interfaces, including Hive, Impala, and Drill.&lt;/p>
&lt;h2 id="nosql-and-document-databases">NoSQL and Document Databases&lt;/h2>
&lt;p>What happens, however, if the structure of the data isn’t known beforehand or if the structure is known but changes frequently? The answer for many companies is to combine both the data and schema definition into documents using a format such as XML or JSON and then store the documents in a database. By doing so, various types of data can be stored in the same database without the need to make schema modifications, which makes storage easier but puts the burden on query and analytic tools to make sense of the data stored in the documents.&lt;/p>
&lt;p>Document databases are a subset of what are called NoSQL databases, which typically store data using a simple key-value mechanism. For example, using a document data‐ base such as MongoDB, you could utilize the customer ID as the key to store a JSON document containing all of the customer’s data, and other users can read the schema stored within the document to make sense of the data stored within.&lt;/p>
&lt;h2 id="cloud-computing">Cloud Computing&lt;/h2>
&lt;p>Prior to the advent of big data, most companies had to build their own data centers to house the database, web, and application servers used across the enterprise. With the advent of cloud computing, you can choose to essentially outsource your data center to platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud. One of the biggest benefits to hosting your services in the cloud is &lt;strong>instant scalability&lt;/strong>, which allows you to quickly dial up or down the amount of computing power needed to run your services. Startups love these platforms because they can start writing code without spending any money up front for servers, storage, networks, or software licenses.&lt;/p>
&lt;p>As far as databases are concerned, a quick look at AWS’s database and analytics offerings yields the following options:&lt;/p>
&lt;ul>
&lt;li>Relational databases (MySQL, Aurora, PostgreSQL, MariaDB, Oracle, and SQL Server)&lt;/li>
&lt;li>In-memory database (ElastiCache)&lt;/li>
&lt;li>Data warehousing database (Redshift)&lt;/li>
&lt;li>NoSQL database (DynamoDB)&lt;/li>
&lt;li>Document database (DocumentDB)&lt;/li>
&lt;li>Graph database (Neptune)&lt;/li>
&lt;li>Time-series database (TimeStream)&lt;/li>
&lt;li>Hadoop (EMR)&lt;/li>
&lt;li>Data lakes (Lake Formation)&lt;/li>
&lt;/ul>
&lt;p>While relational databases dominated the landscape up until the mid-2000s, it’s pretty easy to see that companies are now mixing and matching various platforms and that relational databases may become less popular over time.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Databases are getting larger, but at the same time storage, clustering, and partitioning technologies are becoming more robust. Working with huge amounts of data can be quite challenging, regardless of the technology stack. Whether you use relational databases, big data platforms, or a variety of database servers, SQL is evolving to facilitate data retrieval from various technologies.&lt;/p></description></item><item><title>Learning SQL Notes #14: Analytic Functions</title><link>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-14/</link><pubDate>Fri, 11 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-11-sql-notes-14/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#analytic-function-concepts">Analytic Function Concepts&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#data-windows">Data Windows&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#localized-sorting">Localized Sorting&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#ranking">Ranking&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#ranking-functions">Ranking Functions&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#generating-multiple-rankings">Generating Multiple Rankings&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#reporting-functions">Reporting Functions&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#window-frames">Window Frames&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#lag-and-lead">Lag and Lead&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#column-value-concatenation">Column Value Concatenation&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h1 id="analytic-function-concepts">Analytic Function Concepts&lt;/h1>
&lt;h2 id="data-windows">Data Windows&lt;/h2>
&lt;pre>
SELECT quarter(payment_date) quarter,
monthname(payment_date) month_nm,
sum(amount) monthly_sales,
&lt;b>max(sum(amount))
over () max_overall_sales,&lt;/b>/*controlled by where and group by and return the highest monthly total payment in 2005*/
&lt;b>max(sum(amount))
over (partition by quarter(payment_date)) max_qrtr_sales&lt;/b> /*controlled by where and group by and return the highest monthly total payment in each quarter in 2005*/
FROM payment
WHERE year(payment_date) = 2005
GROUP BY quarter(payment_date), monthname(payment_date);
&lt;/pre>
&lt;p>The analytic functions used to generate these additional columns group rows into two different sets: one set containing all rows in the same quarter and another set containing all of the rows. To accommodate this type of analysis, analytic functions include the ability to group rows into &lt;em>windows&lt;/em>, which effectively partition the data for use by the analytic function without changing the overall result set. Windows are defined using the &lt;code>over&lt;/code> clause combined with an optional &lt;code>partition&lt;/code> by subclause. In the previous query, both analytic functions include an over clause, but the first one is empty, indicating that the window should include the entire result set, whereas the second one specifies that the window should include only rows within the same quarter. Data windows may contain anywhere from a single row to all of the rows in the result set, and different analytic functions can define different data windows.&lt;/p>
&lt;h2 id="localized-sorting">Localized Sorting&lt;/h2>
&lt;pre>
SELECT quarter(payment_date) quarter,
monthname(payment_date) month_nm,
sum(amount) monthly_sales,
&lt;b>rank() over (order by sum(amount) desc)&lt;/b> sales_rank /* order by only controls the rank()*/
FROM payment
WHERE year(payment_date) = 2005
GROUP BY quarter(payment_date), monthname(payment_date)
ORDER BY 1, month(payment_date);/* order by only controls the presentation*/
&lt;/pre>
&lt;p>or you may insert &lt;code>partition by quarter(payment_date)&lt;/code> into the &lt;code>over()&lt;/code> above to obtain rank within each quarter.&lt;/p>
&lt;h1 id="ranking">Ranking&lt;/h1>
&lt;h2 id="ranking-functions">Ranking Functions&lt;/h2>
&lt;p>There are multiple ranking functions available in the SQL standard, with each one taking a different approach to how ties are handled:&lt;/p>
&lt;p>&lt;code>row_number&lt;/code>&lt;/p>
&lt;p>Returns a unique number for each row, with rankings arbitrarily assigned in case of a tie&lt;/p>
&lt;p>&lt;code>rank&lt;/code>&lt;/p>
&lt;p>Returns the same ranking in case of a tie, with gaps in the rankings&lt;/p>
&lt;p>&lt;code>dense_rank&lt;/code>&lt;/p>
&lt;p>Returns the same ranking in case of a tie, with no gaps in the rankings&lt;/p>
&lt;pre>
SELECT customer_id, count(*) num_rentals,
row_number() over (order by count(*) desc) row_number_rnk,
rank() over (order by count(*) desc) rank_rnk,
dense_rank() over (order by count(*) desc) dense_rank_rnk
FROM rental
GROUP BY customer_id
ORDER BY 2 desc;
&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">customer_id&lt;/th>
&lt;th>num_rentals&lt;/th>
&lt;th>row_number_rnk&lt;/th>
&lt;th>rank_rnk&lt;/th>
&lt;th align="right">dense_rank_rnk&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">144&lt;/td>
&lt;td>42&lt;/td>
&lt;td>3&lt;/td>
&lt;td>3&lt;/td>
&lt;td align="right">3&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">236&lt;/td>
&lt;td>42&lt;/td>
&lt;td>4&lt;/td>
&lt;td>3&lt;/td>
&lt;td align="right">3&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">&lt;b>75&lt;/b>&lt;/td>
&lt;td>&lt;b>41&lt;/b>&lt;/td>
&lt;td>&lt;b>5&lt;/b>&lt;/td>
&lt;td>&lt;b>5&lt;/b>&lt;/td>
&lt;td align="right">&lt;b>4&lt;/b>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>To get back to the original request, how would you identify the top 10 customers? There are three possible solutions:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Use the row_number function to identify customers ranked from 1 to 10, which results in exactly 10 customers in this example, but in other cases might exclude customers having the same number of rentals as the 10th ranked customer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the rank function to identify customers ranked 10 or less, which also results in exactly 10 customers.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the dense_rank function to identify customers ranked 10 or less, which yields a list of 37 customers.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="generating-multiple-rankings">Generating Multiple Rankings&lt;/h2>
&lt;pre>
SELECT customer_id,
monthname(rental_date) rental_month,
count(*) num_rentals,
rank() over (&lt;b>partition by monthname(rental_date) &lt;/b>
order by count(*) desc) rank_rnk
FROM rental
GROUP BY customer_id, monthname(rental_date)
ORDER BY 2, 3 desc;
&lt;/pre>
&lt;p>so that rank() starts from 1 for each month.&lt;/p>
&lt;p>Looking at the results, you can see that the rankings are reset to 1 for each month. In order to generate the desired results for the marketing department (top five custom‐ ers from each month), you can simply wrap the previous query in a subquery and add a filter condition to exclude any rows with a ranking higher than five:&lt;/p>
&lt;pre>
SELECT customer_id, rental_month, num_rentals, rank_rnk ranking
FROM
(SELECT customer_id,
monthname(rental_date) rental_month, count(*) num_rentals,
rank() over (partition by monthname(rental_date) order by count(*) desc) rank_rnk
FROM rental
GROUP BY customer_id, monthname(rental_date)
) cust_rankings
&lt;b>WHERE rank_rnk &lt;= 5&lt;/b>
ORDER BY rental_month, num_rentals desc, rank_rnk;
&lt;/pre>
&lt;p>Since analytic functions can be used only in the SELECT clause, you will often need to &lt;strong>nest queries&lt;/strong> if you need to do any filtering or grouping based on the results from the analytic function.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Window Function&lt;/th>
&lt;th>Return Type&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>CUME_DIST()&lt;/td>
&lt;td>DOUBLE PRECISION&lt;/td>
&lt;td>The CUME_DIST() window function calculates the relative rank of the current row within a window partition: (number of rows preceding or peer with current row) / (total rows in the window partition)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DENSE_RANK()&lt;/td>
&lt;td>BIGINT&lt;/td>
&lt;td>The DENSE_RANK () window function determines the rank of a value in a group of values based on the ORDER BY expression and the OVER clause. Each value is ranked within its partition. Rows with equal values receive the same rank. There are no gaps in the sequence of ranked values if two or more rows have the same rank.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>NTILE()&lt;/td>
&lt;td>INTEGER&lt;/td>
&lt;td>The NTILE window function divides the rows for each window partition, as equally as possible, into a specified number of ranked groups. The NTILE window function requires the ORDER BY clause in the OVER clause.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PERCENT_RANK()&lt;/td>
&lt;td>DOUBLE PRECISION&lt;/td>
&lt;td>The PERCENT_RANK () window function calculates the percent rank of the current row using the following formula: (x - 1) / (number of rows in window partition - 1) where x is the rank of the current row.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RANK()&lt;/td>
&lt;td>BIGINT&lt;/td>
&lt;td>The RANK window function determines the rank of a value in a group of values. The ORDER BY expression in the OVER clause determines the value. Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers. For example, if two rows are ranked 1, the next rank is 3. The DENSE_RANK window function differs in that no gaps exist if two or more rows tie.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ROW_NUMBER()&lt;/td>
&lt;td>BIGINT&lt;/td>
&lt;td>The ROW_NUMBER window function determines the ordinal number of the current row within its partition. The ORDER BY expression in the OVER clause determines the number. Each value is ordered within its partition. Rows with equal values for the ORDER BY expressions receive different row numbers nondeterministically.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h1 id="reporting-functions">Reporting Functions&lt;/h1>
&lt;p>Calculate total by month/by total&lt;/p>
&lt;pre>
SELECT monthname(payment_date) payment_month,
amount,
&lt;b>sum(amount) over (partition by monthname(payment_date)) monthly_total,
sum(amount) over () grand_total &lt;/b>
FROM payment
WHERE amount >= 10
ORDER BY 1;
&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">payment_month&lt;/th>
&lt;th>amount&lt;/th>
&lt;th>monthly_total&lt;/th>
&lt;th align="right">grand_total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">August&lt;/td>
&lt;td>10.99&lt;/td>
&lt;td>521.53&lt;/td>
&lt;td align="right">1262.86&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">August&lt;/td>
&lt;td>11.99&lt;/td>
&lt;td>521.53&lt;/td>
&lt;td align="right">1262.86&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Calculate percentage:&lt;/p>
&lt;pre>
SELECT monthname(payment_date) payment_month,
amount,
&lt;b>round(sum(amount) / sum(sum(amount)) over () * 100, 2) pct_of_total&lt;/b>
FROM payment
GROUP BY monthname(payment_date);
&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">payment_month&lt;/th>
&lt;th>month_total&lt;/th>
&lt;th align="right">pct_of_total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">May&lt;/td>
&lt;td>4824.43&lt;/td>
&lt;td align="right">7.16&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">June&lt;/td>
&lt;td>9631.88&lt;/td>
&lt;td align="right">14.29&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">July&lt;/td>
&lt;td>28373.89&lt;/td>
&lt;td align="right">42.09&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">August&lt;/td>
&lt;td>24072.13&lt;/td>
&lt;td align="right">35.71&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">February&lt;/td>
&lt;td>514.18&lt;/td>
&lt;td align="right">0.76&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Quasi-ranking functions:&lt;/p>
&lt;pre>
SELECT monthname(payment_date) payment_month,
sum(amount) month_total,
&lt;b>CASE sum(amount)
WHEN max(sum(amount)) over () THEN 'Highest'
WHEN min(sum(amount)) over () THEN 'Lowest'
ELSE 'Middle'
END descriptor&lt;/b>
FROM payment
GROUP BY monthname(payment_date);
&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">payment_month&lt;/th>
&lt;th>month_total&lt;/th>
&lt;th align="right">descriptor&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">May&lt;/td>
&lt;td>4824.43&lt;/td>
&lt;td align="right">Middle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">June&lt;/td>
&lt;td>9631.88&lt;/td>
&lt;td align="right">Middle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">July&lt;/td>
&lt;td>28373.89&lt;/td>
&lt;td align="right">&lt;b>Highest&lt;/b>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">August&lt;/td>
&lt;td>24072.13&lt;/td>
&lt;td align="right">Middle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">February&lt;/td>
&lt;td>514.18&lt;/td>
&lt;td align="right">&lt;b>Lowest&lt;/b>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="window-frames">Window Frames&lt;/h2>
&lt;pre>
SELECT yearweek(payment_date) payment_week,
sum(amount) week_total,
sum(sum(amount))
&lt;b>over (order by yearweek(payment_date)
rows unbounded preceding)&lt;/b> rolling_sum
FROM payment
GROUP BY yearweek(payment_date)
ORDER BY 1;
&lt;/pre>
&lt;pre>
SELECT yearweek(payment_date) payment_week,
sum(amount) week_total,
avg(sum(amount))
over (order by yearweek(payment_date)
&lt;b>rows between 1 preceding and 1 following&lt;/b>) rolling_3wk_avg
FROM payment
GROUP BY yearweek(payment_date)
ORDER BY 1;
&lt;/pre>
&lt;pre>
SELECT date(payment_date), sum(amount),
avg(sum(amount))
over (order by date(payment_date)
&lt;b>range between interval 3 day preceding and interval 3 day following&lt;/b>) range
FROM payment
WHERE payment_date BETWEEN '2005-07-01' AND '2005-09-01'
GROUP BY date(payment_date)
ORDER BY 1;
&lt;/pre>
&lt;h2 id="lag-and-lead">Lag and Lead&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Window Function&lt;/th>
&lt;th>Argument Type&lt;/th>
&lt;th>Return Type&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>LAG()&lt;/td>
&lt;td>Any supported Drill data types&lt;/td>
&lt;td>Same as the expression type&lt;/td>
&lt;td>The LAG() window function returns the value for the row before the current row in a partition. If no row exists, null is returned.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LEAD()&lt;/td>
&lt;td>Any supported Drill data types&lt;/td>
&lt;td>Same as the expression type&lt;/td>
&lt;td>The LEAD() window function returns the value for the row after the current row in a partition. If no row exists, null is returned.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FIRST_VALUE&lt;/td>
&lt;td>Any supported Drill data types&lt;/td>
&lt;td>Same as the expression type&lt;/td>
&lt;td>The FIRST_VALUE window function returns the value of the specified expression with respect to the first row in the window frame.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LAST_VALUE&lt;/td>
&lt;td>Any supported Drill data types&lt;/td>
&lt;td>Same as the expression type&lt;/td>
&lt;td>The LAST_VALUE window function returns the value of the specified expression with respect to the last row in the window frame.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>
SELECT yearweek(payment_date) payment_week,
sum(amount) week_total,
&lt;b>lag(sum(amount), 1)
over (order by yearweek(payment_date)) prev_wk_tot,&lt;/b>
&lt;b>lead(sum(amount), 1)
over (order by yearweek(payment_date)) next_wk_tot,&lt;/b>
FROM payment
GROUP BY yearweek(payment_date)
ORDER BY 1;
&lt;/pre>
&lt;pre>
SELECT yearweek(payment_date) payment_week,
sum(amount) week_total,
&lt;b>round((sum(amount) - lag(sum(amount), 1)
over (order by yearweek(payment_date))) / lag(sum(amount), 1)
over (order by yearweek(payment_date)) * 100, 1) pct_diff&lt;/b>
FROM payment
GROUP BY yearweek(payment_date)
ORDER BY 1;
&lt;/pre>
&lt;h2 id="column-value-concatenation">Column Value Concatenation&lt;/h2>
&lt;pre>
SELECT f.title,
&lt;B>group_concat(a.last_name order by a.last_name separator ', ') actors&lt;/b>
FROM actor a
INNER JOIN film_actor fa
ON a.actor_id = fa.actor_id
INNER JOIN film f
ON fa.film_id = f.film_id
GROUP BY f.title
HAVING count(*) = 3;
&lt;/pre>&lt;blockquote>
&lt;/blockquote></description></item><item><title>Learning SQL Notes #13: Metadata</title><link>https://siqi-zheng.rbind.io/post/2021-06-10-sql-notes-13/</link><pubDate>Thu, 10 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-10-sql-notes-13/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#data-about-data">Data About Data&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#information_schema">information_schema&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#working-with-metadata">Working with Metadata&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#schema-generation-scripts">Schema Generation Scripts&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#deployment-verification">Deployment Verification&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#dynamic-sql-generation">Dynamic SQL Generation&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>A database server also needs to store information about all of the database objects (tables, views, indexes, etc.) that were created to store this data in a database. This chapter discusses how and where this information, known as &lt;em>metadata&lt;/em>, is stored, how you can access it, and how you can use it to build flexible systems.&lt;/p>
&lt;h1 id="data-about-data">Data About Data&lt;/h1>
&lt;p>Metadata is essentially data about data. Every time you create a database object, the database server needs to record various pieces of information. For example, if you were to create a table with multiple columns, a primary key constraint, three indexes, and a foreign key constraint, the database server would need to store all the following information:&lt;/p>
&lt;ul>
&lt;li>Table name&lt;/li>
&lt;li>Table storage information (tablespace, initial size, etc.)&lt;/li>
&lt;li>Storage engine&lt;/li>
&lt;li>Column names&lt;/li>
&lt;li>Column data types&lt;/li>
&lt;li>Default column values&lt;/li>
&lt;li>not null column constraints&lt;/li>
&lt;li>Primary key columns&lt;/li>
&lt;li>Primary key name&lt;/li>
&lt;li>Name of primary key index&lt;/li>
&lt;li>Index names&lt;/li>
&lt;li>Index types (B-tree, bitmap)&lt;/li>
&lt;li>Indexed columns&lt;/li>
&lt;li>Index column sort order (ascending or descending)&lt;/li>
&lt;li>Index storage information&lt;/li>
&lt;li>Foreign key name&lt;/li>
&lt;li>Foreign key columns&lt;/li>
&lt;li>Associated table/columns for foreign keys&lt;/li>
&lt;/ul>
&lt;p>This data is collectively known as the &lt;em>data dictionary&lt;/em> or &lt;em>system catalog&lt;/em>. The database server needs to store this data persistently, and it needs to be able to quickly retrieve this data in order to verify and execute SQL statements. Additionally, the database server must safeguard this data so that it can be modified only via an appropriate mechanism, such as the &lt;code>alter&lt;/code> table statement.&lt;/p>
&lt;p>Every database server uses a different mechanism to publish metadata, such as:&lt;/p>
&lt;ul>
&lt;li>A set of views, such as Oracle Database’s user_tables and all_constraints views&lt;/li>
&lt;li>A set of system-stored procedures, such as SQL Server’s sp_tables procedure or Oracle Database’s dbms_metadata package&lt;/li>
&lt;li>A special database, such as MySQL’s information_schema database&lt;/li>
&lt;/ul>
&lt;h1 id="information_schema">information_schema&lt;/h1>
&lt;p>All of the objects available within the information_schema database (or &lt;em>schema&lt;/em>, in the case of SQL Server) are views. Unlike the describe utility, the views within information_schema can be queried and, thus, used programmatically.&lt;/p>
&lt;table frame="box" rules="all" summary="A reference that lists all INFORMATION_SCHEMA tables.">&lt;col style="width: 22%">&lt;col style="width: 55%">&lt;col style="width: 11%">&lt;col style="width: 11%">&lt;thead>&lt;tr>&lt;th>Table Name&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Introduced&lt;/th>
&lt;th>Deprecated&lt;/th>
&lt;/tr>&lt;/thead>&lt;tbody>&lt;tr>&lt;th scope="row">&lt;code class="literal">ADMINISTRABLE_ROLE_AUTHORIZATIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Grantable users or roles for current user or role&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">APPLICABLE_ROLES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Applicable roles for current user&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">CHARACTER_SETS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Available character sets&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">CHECK_CONSTRAINTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table and column CHECK constraints&lt;/td>
&lt;td>8.0.16&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLLATION_CHARACTER_SET_APPLICABILITY&lt;/code>&lt;/a>&lt;/th>
&lt;td>Character set applicable to each collation&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLLATIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Collations for each character set&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLUMN_PRIVILEGES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Privileges defined on columns&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLUMN_STATISTICS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Histogram statistics for column values&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLUMNS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Columns in each table&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">COLUMNS_EXTENSIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Column attributes for primary and secondary storage engines&lt;/td>
&lt;td>8.0.21&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Current number of consecutive failed connection attempts per account&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ENABLED_ROLES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Roles enabled within current session&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ENGINES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Storage engine properties&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">EVENTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Event Manager events&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">FILES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Files that store tablespace data&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_BUFFER_PAGE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Pages in InnoDB buffer pool&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_BUFFER_PAGE_LRU&lt;/code>&lt;/a>&lt;/th>
&lt;td>LRU ordering of pages in InnoDB buffer pool&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_BUFFER_POOL_STATS&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB buffer pool statistics&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CACHED_INDEXES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Number of index pages cached per index in InnoDB buffer pool&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMP&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for operations related to compressed InnoDB tables&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMP_PER_INDEX&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for operations related to compressed InnoDB tables and indexes&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMP_PER_INDEX_RESET&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for operations related to compressed InnoDB tables and indexes&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMP_RESET&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for operations related to compressed InnoDB tables&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMPMEM&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for compressed pages within InnoDB buffer pool&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_CMPMEM_RESET&lt;/code>&lt;/a>&lt;/th>
&lt;td>Status for compressed pages within InnoDB buffer pool&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_COLUMNS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Columns in each InnoDB table&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_DATAFILES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Data file path information for InnoDB file-per-table and general tablespaces&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FIELDS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Key columns of InnoDB indexes&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FOREIGN&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB foreign-key metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FOREIGN_COLS&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB foreign-key column status information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_BEING_DELETED&lt;/code>&lt;/a>&lt;/th>
&lt;td>Snapshot of INNODB_FT_DELETED table&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_CONFIG&lt;/code>&lt;/a>&lt;/th>
&lt;td>Metadata for InnoDB table FULLTEXT index and associated processing&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_DEFAULT_STOPWORD&lt;/code>&lt;/a>&lt;/th>
&lt;td>Default list of stopwords for InnoDB FULLTEXT indexes&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_DELETED&lt;/code>&lt;/a>&lt;/th>
&lt;td>Rows deleted from InnoDB table FULLTEXT index&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_INDEX_CACHE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Token information for newly inserted rows in InnoDB FULLTEXT index&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_FT_INDEX_TABLE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Inverted index information for processing text searches against InnoDB table FULLTEXT index&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_INDEXES&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB index metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_METRICS&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB performance information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_SESSION_TEMP_TABLESPACES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Session temporary-tablespace metadata&lt;/td>
&lt;td>8.0.13&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TABLES&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB table metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TABLESPACES&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB file-per-table, general, and undo tablespace metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TABLESPACES_BRIEF&lt;/code>&lt;/a>&lt;/th>
&lt;td>Brief file-per-table, general, undo, and system tablespace metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TABLESTATS&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB table low-level status information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TEMP_TABLE_INFO&lt;/code>&lt;/a>&lt;/th>
&lt;td>Information about active user-created InnoDB temporary tables&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_TRX&lt;/code>&lt;/a>&lt;/th>
&lt;td>Active InnoDB transaction information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">INNODB_VIRTUAL&lt;/code>&lt;/a>&lt;/th>
&lt;td>InnoDB virtual generated column metadata&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">KEY_COLUMN_USAGE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Which key columns have constraints&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">KEYWORDS&lt;/code>&lt;/a>&lt;/th>
&lt;td>MySQL keywords&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">MYSQL_FIREWALL_USERS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Firewall in-memory data for account profiles&lt;/td>
&lt;td>&lt;/td>
&lt;td>8.0.26&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">MYSQL_FIREWALL_WHITELIST&lt;/code>&lt;/a>&lt;/th>
&lt;td>Firewall in-memory data for account profile allowlists&lt;/td>
&lt;td>&lt;/td>
&lt;td>8.0.26&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ndb_transid_mysql_connection_map&lt;/code>&lt;/a>&lt;/th>
&lt;td>NDB transaction information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">OPTIMIZER_TRACE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Information produced by optimizer trace activity&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">PARAMETERS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Stored routine parameters and stored function return values&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">PARTITIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table partition information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">PLUGINS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Plugin information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">PROCESSLIST&lt;/code>&lt;/a>&lt;/th>
&lt;td>Information about currently executing threads&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">PROFILING&lt;/code>&lt;/a>&lt;/th>
&lt;td>Statement profiling information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">REFERENTIAL_CONSTRAINTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Foreign key information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">RESOURCE_GROUPS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Resource group information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ROLE_COLUMN_GRANTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Column privileges for roles available to or granted by currently enabled roles&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ROLE_ROUTINE_GRANTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Routine privileges for roles available to or granted by currently enabled roles&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ROLE_TABLE_GRANTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table privileges for roles available to or granted by currently enabled roles&lt;/td>
&lt;td>8.0.19&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ROUTINES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Stored routine information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">SCHEMA_PRIVILEGES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Privileges defined on schemas&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">SCHEMATA&lt;/code>&lt;/a>&lt;/th>
&lt;td>Schema information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">SCHEMATA_EXTENSIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Schema options&lt;/td>
&lt;td>8.0.22&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ST_GEOMETRY_COLUMNS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Columns in each table that store spatial data&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ST_SPATIAL_REFERENCE_SYSTEMS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Available spatial reference systems&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">ST_UNITS_OF_MEASURE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Acceptable units for ST_Distance()&lt;/td>
&lt;td>8.0.14&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">STATISTICS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table index statistics&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLE_CONSTRAINTS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Which tables have constraints&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLE_CONSTRAINTS_EXTENSIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table constraint attributes for primary and secondary storage engines&lt;/td>
&lt;td>8.0.21&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLE_PRIVILEGES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Privileges defined on tables&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLES_EXTENSIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Table attributes for primary and secondary storage engines&lt;/td>
&lt;td>8.0.21&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLESPACES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Tablespace information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TABLESPACES_EXTENSIONS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Tablespace attributes for primary storage engines&lt;/td>
&lt;td>8.0.21&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TP_THREAD_GROUP_STATE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Thread pool thread group states&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TP_THREAD_GROUP_STATS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Thread pool thread group statistics&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TP_THREAD_STATE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Thread pool thread information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">TRIGGERS&lt;/code>&lt;/a>&lt;/th>
&lt;td>Trigger information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">USER_ATTRIBUTES&lt;/code>&lt;/a>&lt;/th>
&lt;td>User comments and attributes&lt;/td>
&lt;td>8.0.21&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">USER_PRIVILEGES&lt;/code>&lt;/a>&lt;/th>
&lt;td>Privileges defined globally per user&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">VIEW_ROUTINE_USAGE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Stored functions used in views&lt;/td>
&lt;td>8.0.13&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">VIEW_TABLE_USAGE&lt;/code>&lt;/a>&lt;/th>
&lt;td>Tables and views used in views&lt;/td>
&lt;td>8.0.13&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;tr>&lt;th scope="row">&lt;code class="literal">VIEWS&lt;/code>&lt;/a>&lt;/th>
&lt;td>View information&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>&lt;/tbody>&lt;/table>
&lt;h1 id="working-with-metadata">Working with Metadata&lt;/h1>
&lt;h2 id="schema-generation-scripts">Schema Generation Scripts&lt;/h2>
&lt;p>Generate a script that will create the various tables, indexes, views, and so on, that the team has deployed. Build a script that will create the sakila.category table. The following codes can be used to create a template-like SQL script.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT 'CREATE TABLE category (' create_table_statement
UNION ALL
SELECT cols.txt
FROM
(SELECT concat(' ',column_name, ' ', column_type,
CASE
WHEN is_nullable = 'NO' THEN ' not null' ELSE ''
END, CASE
WHEN extra IS NOT NULL AND extra LIKE 'DEFAULT_GENERATED%' THEN concat(' DEFAULT ',column_default,substr(extra,18)) WHEN extra IS NOT NULL THEN concat(' ', extra)
ELSE '' END, ',') txt
FROM information_schema.columns
WHERE table_schema = 'sakila' AND table_name = 'category'
ORDER BY ordinal_position
) cols
UNION ALL
SELECT concat(' constraint primary key (')
FROM information_schema.table_constraints
WHERE table_schema = 'sakila' AND table_name = 'category'
AND constraint_type = 'PRIMARY KEY'
UNION ALL
SELECT cols.txt
FROM
(SELECT concat(CASE WHEN ordinal_position &amp;gt; 1 THEN ' ,'
ELSE ' ' END, column_name) txt
FROM information_schema.key_column_usage
WHERE table_schema = 'sakila' AND table_name = 'category'
AND constraint_name = 'PRIMARY'
ORDER BY ordinal_position
) cols
UNION ALL
SELECT ' )'
UNION ALL
SELECT ')';
&lt;/code>&lt;/pre>
&lt;h2 id="deployment-verification">Deployment Verification&lt;/h2>
&lt;p>After the deployment scripts have been run, it’s a good idea to run a verification script to ensure that the new schema objects are in place with the appropriate columns, indexes, primary keys, and so forth. Here’s a query that returns the number of columns, number of indexes, and number of primary key constraints (0 or 1) for each table in the Sakila schema:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT tbl.table_name,
(SELECT count(*)
FROM information_schema.columns clm
WHERE clm.table_schema = tbl.table_schema
AND clm.table_name = tbl.table_name) num_columns,
(SELECT count(*)
FROM information_schema.statistics sta
WHERE sta.table_schema = tbl.table_schema
AND sta.table_name = tbl.table_name) num_indexes,
(SELECT count(*)
FROM information_schema.table_constraints tc
WHERE tc.table_schema = tbl.table_schema
AND tc.table_name = tbl.table_name
AND tc.constraint_type = 'PRIMARY KEY') num_primary_keys
FROM information_schema.tables tbl
WHERE tbl.table_schema = 'sakila' AND tbl.table_type = 'BASE TABLE'
ORDER BY 1;
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">TABLE_NAME&lt;/th>
&lt;th>num_columns&lt;/th>
&lt;th>num_indexes&lt;/th>
&lt;th align="right">num_primary_keys&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">actor&lt;/td>
&lt;td>4&lt;/td>
&lt;td>2&lt;/td>
&lt;td align="right">1&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="dynamic-sql-generation">Dynamic SQL Generation&lt;/h2>
&lt;p>Most relational database servers, including SQL Server, Oracle Database, and MySQL, allow SQL statements to be submitted to the server as strings. Submit‐ ting strings to a database engine rather than utilizing its SQL interface is generally known as &lt;em>dynamic SQL execution&lt;/em>.&lt;/p>
&lt;p>&lt;em>Oracle’s PL/SQL language&lt;/em>&lt;/p>
&lt;p>&lt;code>execute immediate&lt;/code>&lt;/p>
&lt;p>&lt;em>SQL Server&lt;/em>&lt;/p>
&lt;p>&lt;code>sp_executesql&lt;/code>&lt;/p>
&lt;p>&lt;em>MySQL&lt;/em>&lt;/p>
&lt;p>&lt;code>prepare, execute, deallocate&lt;/code>&lt;/p>
&lt;pre>&lt;code class="language-sql">SET @qry = 'SELECT customer_id, first_name, last_name FROM customer';
PREPARE dynsql1 FROM @qry;
EXECUTE dynsql1;
DEALLOCATE PREPARE dynsql1;
/*conditions can be specified at runtime*/
SET @qry = 'SELECT customer_id, first_name, last_name FROM customer WHERE customer_id = ?';
PREPARE dynsql2 FROM @qry;
SET @custid = 9;
EXECUTE dynsql2 USING @custid;
SET @custid = 145;
EXECUTE dynsql2 USING @custid;
DEALLOCATE PREPARE dynsql2;
&lt;/code>&lt;/pre>
&lt;p>Or you can do the following:&lt;/p>
&lt;pre>
SELECT concat('SELECT ', concat_ws(',', cols.col1, cols.col2),
' FROM customer WHERE customer_id = ?')
&lt;b>INTO @qry &lt;/b>
FROM (SELECT
max(CASE WHEN ordinal_position = 1 THEN column_name
ELSE NULL END) col1,
max(CASE WHEN ordinal_position = 2 THEN column_name
ELSE NULL END) col2
FROM information_schema.columns
WHERE table_schema = 'sakila' AND table_name = 'customer'
GROUP BY table_name
) cols;
&lt;/pre>
&lt;pre>&lt;code class="language-sql">PREPARE dynsql3 FROM @qry;
SET @custid = 45; Query OK, 0 rows affected (0.00 sec)
EXECUTE dynsql3 USING @custid;
DEALLOCATE PREPARE dynsql3;
&lt;/code>&lt;/pre>
&lt;p>Note: Generally, it would be better to generate the query using a procedural language that includes looping constructs, such as Java, PL/SQL, Transact-SQL, or MySQL’s Stored Procedure Language.&lt;/p></description></item><item><title>Learning SQL Notes #12: Views</title><link>https://siqi-zheng.rbind.io/post/2021-06-09-sql-notes-12/</link><pubDate>Wed, 09 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-09-sql-notes-12/</guid><description>&lt;p>Well-designed applications generally expose a public interface while keeping imple‐ mentation details private, thereby enabling future design changes without impacting end users. When designing your database, you can achieve a similar result by keeping your tables private and allowing your users to access data only through a set of &lt;em>views&lt;/em>.&lt;/p>
&lt;h1 id="what-are-views">What Are Views?&lt;/h1>
&lt;pre>&lt;code class="language-sql">CREATE VIEW customer_vw
(customer_id,
first_name,
last_name,
email
)
AS
SELECT customer_id,
first_name,
last_name,
concat(substr(email,1,2), '*****', substr(email, -4)) email
FROM customer;
/*view the View*/
describe customer_vw;
/*group by, having, where, join etc. can also be used*/
&lt;/code>&lt;/pre>
&lt;h1 id="why-use-views">Why Use Views?&lt;/h1>
&lt;ul>
&lt;li>
&lt;p>Data Security&lt;/p>
&lt;p>Oracle Database users have another option for securing both rows and columns of a table: Virtual Private Database (VPD). VPD allows you to attach policies to your tables, after which the server will modify a user’s query as necessary to enforce the policies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Data Aggregation&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE VIEW sales_by_film_category AS
SELECT c.name AS category,
SUM(p.amount) AS total_sales
FROM payment AS p
INNER JOIN rental AS r
ON p.rental_id = r.rental_id
INNER JOIN inventory AS i
ON r.inventory_id = i.inventory_id
INNER JOIN film AS f
ON i.film_id = f.film_id
INNER JOIN film_category AS fc
ON f.film_id = fc.film_id
INNER JOIN category AS c
ON fc.category_id = c.category_id
GROUP BY c.name
ORDER BY total_sales DESC;
&lt;/code>&lt;/pre>
&lt;p>You have great flexibility! You can create a film_category_sales table, load it with aggregated data, and modify the sales_by_film_category view definition to retrieve data from this table if this improves the performance significantly.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hiding Complexity&lt;/p>
&lt;p>One of the most common reasons for deploying views is to shield end users from complexity.&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE VIEW film_stats AS
SELECT f.film_id, f.title, f.description, f.rating,
(SELECT c.name
FROM category c
INNER JOIN film_category fc
ON c.category_id = fc.category_id
WHERE fc.film_id = f.film_id) category_name,
(SELECT count(*)
FROM film_actor fa
WHERE fa.film_id = f.film_id ) num_actors,
(SELECT count(*)
FROM inventory i
WHERE i.film_id = f.film_id ) inventory_cnt,
(SELECT count(*)
FROM inventory i
INNER JOIN rental r
ON i.inventory_id = r.inventory_id
WHERE i.film_id = f.film_id ) num_rentals
FROM film f;
&lt;/code>&lt;/pre>
&lt;p>If someone uses this view but does not reference the category_name, num_actors, inventory_cnt, or num_rentals column, then none of the subqueries will be executed. This approach allows the view to be used for supplying descriptive information from the film table without unnecessarily joining five other tables.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Joining Partitioned Data&lt;/p>
&lt;p>Some database designs break large tables into multiple pieces in order to improve performance. For example, if the payment table became large, the designers may decide to break it into two tables: payment_current, which holds the latest six months of data, and payment_historic, which holds all data up to six months ago. You can make it look like all payment data is stored in a single table.&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE VIEW payment_all
(payment_id,
customer_id,
staff_id,
rental_id, amount,
payment_date,
last_update
) AS
SELECT payment_id, customer_id, staff_id, rental_id, amount, payment_date, last_update
FROM payment_historic
UNION ALL
SELECT payment_id, customer_id, staff_id, rental_id, amount, payment_date, last_update
FROM payment_current;
&lt;/code>&lt;/pre>
&lt;p>Using a view in this case is a good idea because it allows the designers to change the structure of the underlying data without the need to force all database users to modify their queries.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h1 id="updatable-views">Updatable Views&lt;/h1>
&lt;p>In the case of MySQL, a view is updatable if the following conditions are met:&lt;/p>
&lt;ul>
&lt;li>No aggregate functions are used (max(), min(), avg(), etc.).&lt;/li>
&lt;li>The view does not employ group by or having clauses.&lt;/li>
&lt;li>No subqueries exist in the select or from clause, and any subqueries in the where clause do not refer to tables in the from clause.&lt;/li>
&lt;li>The view does not utilize union, union all, or distinct.&lt;/li>
&lt;li>The from clause includes at least one table or updatable view.&lt;/li>
&lt;li>The from clause uses only inner joins if there is more than one table or view.&lt;/li>
&lt;/ul>
&lt;h2 id="updating-simple-views">Updating Simple Views&lt;/h2>
&lt;pre>&lt;code class="language-sql">UPDATE customer_vw
SET last_name = 'SMITH-ALLEN'
WHERE customer_id = 1;
&lt;/code>&lt;/pre>
&lt;p>No&lt;code>insert&lt;/code> for views that contain derived columns, even if the derived columns are not included in the statement. Cannot modify columns derived from an expression.&lt;/p>
&lt;h2 id="updating-complex-views">Updating Complex Views&lt;/h2>
&lt;p>For complex views with more than one table, you are allowed to modify both of the underlying tables separately, but not within a single statement. In order to insert data through a complex view, you would need to know from where each column is sourced. Since many views are created to hide complexity from end users, this seems to defeat the purpose if the users need to have explicit knowledge of the view definition.&lt;/p></description></item><item><title>Learning SQL Notes #11: Indexes and Constraints</title><link>https://siqi-zheng.rbind.io/post/2021-06-08-sql-notes-11/</link><pubDate>Tue, 08 Jun 2021 05:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-08-sql-notes-11/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#indexes">Indexes&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#index-creation">Index Creation&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#unique-indexes">Unique indexes&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#multicolumn-indexes">Multicolumn indexes&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#types-of-indexes">Types of Indexes&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#b-tree-indexes">B-tree indexes&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#bitmap-indexes">Bitmap indexes&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#text-indexes">Text indexes&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#how-indexes-are-used">How Indexes Are Used&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#the-downside-of-indexes">The Downside of Indexes&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#constraints">Constraints&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#constraint-creation">Constraint Creation&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h1 id="indexes">Indexes&lt;/h1>
&lt;p>The server simply places the data in the next available location within the file (the server
maintains a list of free space for each table).&lt;/p>
&lt;p>To find all customers whose last name begins with Y, the server must visit each row in the customer table and inspect the contents of the last_name column; if the last name begins with Y, then the row is added to the result set. This type of access is known as a &lt;em>table&lt;/em> &lt;em>scan&lt;/em>.&lt;/p>
&lt;p>An index is simply a mechanism for finding a specific item within a resource. A database server uses indexes to locate rows in a table. Indexes are special tables that, unlike normal data tables, &lt;em>are&lt;/em> kept in a specific order. Instead of containing &lt;em>all&lt;/em> of the data about an entity, however, an index contains only the column (or columns) used to locate rows in the data table, along with information describing where the rows are physically located. Therefore, the role of indexes is to facilitate the retrieval of a subset of a table’s rows and columns &lt;em>without&lt;/em> the need to inspect every row in the table.&lt;/p>
&lt;h2 id="index-creation">Index Creation&lt;/h2>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
ALTER TABLE customer
ADD INDEX idx_email (email);
/*OR*/
ALTER TABLE customer
DROP INDEX idx_email;
/*SQL Server*/
CREATE INDEX idx_email
ON customer (email);
SHOW INDEX FROM customer \G;
&lt;/code>&lt;/pre>
&lt;p>To create indexes, we can&lt;/p>
&lt;pre>
CREATE TABLE customer (
customer_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
...
&lt;b>PRIMARY KEY (customer_id),
KEY idx_fk_store_id (store_id),
KEY idx_fk_address_id (address_id),
KEY idx_last_name (last_name),&lt;/b>
...
&lt;/pre>
&lt;h3 id="unique-indexes">Unique indexes&lt;/h3>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
ALTER TABLE customer
ADD UNIQUE INDEX idx_email (email);
/*SQL Server/Oracle Database*/
CREATE UNIQUE INDEX idx_email
ON customer (email);
&lt;/code>&lt;/pre>
&lt;p>You should not build unique indexes on your primary key column(s), since the server already checks uniqueness for primary key values.&lt;/p>
&lt;h3 id="multicolumn-indexes">Multicolumn indexes&lt;/h3>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
ALTER TABLE customer
ADD INDEX idx_full_name (last_name, first_name);
/*SQL Server/Oracle Database*/
CREATE UNIQUE INDEX idx_email
ON customer (email);
&lt;/code>&lt;/pre>
&lt;h2 id="types-of-indexes">Types of Indexes&lt;/h2>
&lt;h3 id="b-tree-indexes">B-tree indexes&lt;/h3>
&lt;p>All the indexes shown thus far are &lt;em>balanced-tree indexes&lt;/em>, which are more commonly known as &lt;em>B-tree indexes&lt;/em>. MySQL, Oracle Database, and SQL Server all default to B-tree indexing.&lt;/p>
&lt;ul>
&lt;li>B-tree indexes are organized as trees, with one or more levels of branch nodes leading to a single level of leaf nodes.&lt;/li>
&lt;li>The server would look at the top branch node (called the root node) and follow the link to the branch node.&lt;/li>
&lt;li>The server can add or remove branch nodes to redistribute the values more evenly and can even add or remove an entire level of branch nodes.&lt;/li>
&lt;/ul>
&lt;h3 id="bitmap-indexes">Bitmap indexes&lt;/h3>
&lt;p>If there are only two different values (stored as 1 for active and 0 for inactive) and far more active customers, it can be difficult to maintain a balanced B-tree index as the number of customers grows.&lt;/p>
&lt;p>For columns that contain only a small number of values across a large number of rows (known as &lt;em>low-cardinality data&lt;/em>), Oracle Database includes bitmap indexes, which generate a bitmap for each value stored in the column.&lt;/p>
&lt;pre>&lt;code class="language-sql">/*Oracle Database*/
CREATE BITMAP INDEX idx_active ON customer (active);
&lt;/code>&lt;/pre>
&lt;p>Bitmap indexes are commonly used in data warehousing environments, where large amounts of data are generally indexed on columns containing relatively few values (e.g., sales quarters, geographic regions, products, salespeople).&lt;/p>
&lt;h3 id="text-indexes">Text indexes&lt;/h3>
&lt;h2 id="how-indexes-are-used">How Indexes Are Used&lt;/h2>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
EXPLAIN
SELECT customer_id, first_name, last_name
FROM customer
WHERE first_name LIKE 'S%' AND last_name LIKE 'P%';
/*SQL Server*/
set show plan_text
/*Oracle Database*/
explain plan
&lt;/code>&lt;/pre>
&lt;p>For this query, the server can employ any of the following strategies:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Scan all rows in the customer table.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the index on the last_name column to find all customers whose last name starts with P; then visit each row of the customer table to find only rows whose first name starts with S.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the index on the last_name and first_name columns to find all customers whose last name starts with P and whose first name starts with S.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Looking at the query results, the &lt;code>possible_keys&lt;/code> column tells you that the server could decide to use either the &lt;code>idx_last_name&lt;/code> or the &lt;code>idx_full_name&lt;/code> index, and the key column tells you that the &lt;code>idx_full_name&lt;/code> index was chosen. Furthermore, the &lt;code>type&lt;/code> column tells you that a range scan will be utilized, meaning that the database server will be looking for a range of values in the index, rather than expecting to retrieve a single row.&lt;/p>
&lt;h2 id="the-downside-of-indexes">The Downside of Indexes&lt;/h2>
&lt;p>&lt;strong>Every index is a table&lt;/strong> (a special type of table but still a table). Therefore, every time a row is added to or removed from a table, all indexes on that table must be modified. When a row is updated, any indexes on the column or columns that were affected need to be modified as well. Therefore, the more indexes you have, the more work the server needs to do to keep all schema objects up-to-date, which tends to slow things down.&lt;/p>
&lt;p>Indexes also require &lt;strong>disk space&lt;/strong> as well as some amount of care from your administrators, so the best strategy is to add an index when a clear need arises. If you need an index for only special purposes, such as a monthly maintenance routine, you can always add the index, run the routine, and then drop the index until you need it again. In the case of data warehouses, where indexes are crucial during business hours as users run reports and ad hoc queries but are problematic when data is being loaded into the warehouse overnight, it is a common practice to drop the indexes before data is loaded and then re-create them before the warehouse opens for business.&lt;/p>
&lt;p>In general, you should strive to have neither too many indexes nor too few. If you aren’t sure how many indexes you should have, you can use this strategy as a default:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Make sure all primary key columns are indexed (most servers automatically cre‐ ate unique indexes when you create primary key constraints). For multicolumn primary keys, consider building additional indexes on a subset of the primary key columns or on all the primary key columns but in a different order than the primary key constraint definition.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Build indexes on all columns that are referenced in foreign key constraints. Keep in mind that the server checks to make sure there are no child rows when a par‐ ent is deleted, so it must issue a query to search for a particular value in the col‐ umn. If there’s no index on the column, the entire table must be scanned.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Index any columns that will frequently be used to retrieve data. Most date columns are good candidates, along with short (2- to 50-character) string columns.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h1 id="constraints">Constraints&lt;/h1>
&lt;p>A constraint is simply a restriction placed on one or more columns of a table. There are several different types of constraints, including:&lt;/p>
&lt;p>&lt;em>Primary key constraints&lt;/em>
Identify the column or columns that guarantee uniqueness within a table&lt;/p>
&lt;p>&lt;em>Foreign key constraints&lt;/em>
Restrict one or more columns to contain only values found in another table’s pri‐ mary key columns (may also restrict the allowable values in other tables if update cascade or delete cascade rules are established)&lt;/p>
&lt;p>&lt;em>Unique constraints&lt;/em>
Restrict one or more columns to contain unique values within a table (primary key constraints are a special type of unique constraint)&lt;/p>
&lt;p>&lt;em>Check constraints&lt;/em>
Restrict the allowable values for a column&lt;/p>
&lt;p>If the server allows you to change a customer’s ID in the customer table without changing the same customer ID in the rental table, then you will end up with rental data that no longer points to valid customer records (known as &lt;em>orphaned rows&lt;/em>). With primary and foreign key constraints in place, however, the server will either raise an error if an attempt is made to modify or delete data that is referenced by other tables or propagate the changes to other tables for you&lt;/p>
&lt;p>Note: If you want to use foreign key constraints with the MySQL server, you must use the &lt;em>InnoDB&lt;/em> storage engine for your tables.&lt;/p>
&lt;h3 id="constraint-creation">Constraint Creation&lt;/h3>
&lt;pre>
CREATE TABLE customer (
...
&lt;b>PRIMARY KEY (customer_id), &lt;/b>
KEY idx_fk_store_id (store_id),
KEY idx_fk_address_id (address_id),
KEY idx_last_name (last_name),
&lt;b>CONSTRAINT fk_customer_address FOREIGN KEY (address_id) REFERENCES address (address_id) ON DELETE RESTRICT ON UPDATE CASCADE,
CONSTRAINT fk_customer_store FOREIGN KEY (store_id)REFERENCES store (store_id) ON DELETE RESTRICT ON UPDATE CASCADE&lt;/b>
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
/*For existing tables, you can do"*/
ALTER TABLE customer
&lt;b>ADD CONSTRAINT&lt;/b> fk_customer_address FOREIGN KEY (address_id)
REFERENCES address (address_id) ON DELETE RESTRICT ON UPDATE CASCADE;
ALTER TABLE customer
&lt;b>ADD CONSTRAINT&lt;/b> fk_customer_store FOREIGN KEY (store_id)
REFERENCES store (store_id) ON DELETE RESTRICT ON UPDATE CASCADE;
/*if you want to drop them*/
ALTER TABLE customer
&lt;b>DROP CONSTRAINT&lt;/b> fk_customer_address;
ALTER TABLE customer
&lt;b>DROP CONSTRAINT&lt;/b> fk_customer_store F;
&lt;/pre>
&lt;ul>
&lt;li>on delete restrict, which will cause the server to raise an error if a row is deleted in the parent table (address or store) that is referenced in the child table (customer)&lt;/li>
&lt;li>on update cascade, which will cause the server to propagate a change to the primary key value of a parent table (address or store) to the child table (customer)&lt;/li>
&lt;/ul>
&lt;table>&lt;thead>
&lt;tr>
&lt;th>Parameter&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>&lt;tbody>
&lt;tr>
&lt;td>&lt;code>ON DELETE NO ACTION&lt;/code>&lt;/td>
&lt;td>&lt;em>Default action.&lt;/em> If there are any existing references to the key being deleted, the transaction will fail at the end of the statement. The key can be updated, depending on the &lt;code>ON UPDATE&lt;/code> action. &lt;br>&lt;br>Alias: &lt;code>ON DELETE RESTRICT&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ON UPDATE NO ACTION&lt;/code>&lt;/td>
&lt;td>&lt;em>Default action.&lt;/em> If there are any existing references to the key being updated, the transaction will fail at the end of the statement. The key can be deleted, depending on the &lt;code>ON DELETE&lt;/code> action. &lt;br>&lt;br>Alias: &lt;code>ON UPDATE RESTRICT&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ON DELETE RESTRICT&lt;/code> / &lt;code>ON UPDATE RESTRICT&lt;/code>&lt;/td>
&lt;td>&lt;code>RESTRICT&lt;/code> and &lt;code>NO ACTION&lt;/code> are currently equivalent until options for deferring constraint checking are added. To set an existing foreign key action to &lt;code>RESTRICT&lt;/code>, the foreign key constraint must be dropped and recreated.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ON DELETE CASCADE&lt;/code> / &lt;code>ON UPDATE CASCADE&lt;/code>&lt;/td>
&lt;td>When a referenced foreign key is deleted or updated, all rows referencing that key are deleted or updated, respectively. If there are other alterations to the row, such as a &lt;code>SET NULL&lt;/code> or &lt;code>SET DEFAULT&lt;/code>, the delete will take precedence. &lt;br>&lt;br>Note that &lt;code>CASCADE&lt;/code> does not list objects it drops or updates, so it should be used cautiously.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ON DELETE SET NULL&lt;/code> / &lt;code>ON UPDATE SET NULL&lt;/code>&lt;/td>
&lt;td>When a referenced foreign key is deleted or updated, respectively, the columns of all rows referencing that key will be set to &lt;code>NULL&lt;/code>. The column must allow &lt;code>NULL&lt;/code> or this update will fail.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ON DELETE SET DEFAULT&lt;/code> / &lt;code>ON UPDATE SET DEFAULT&lt;/code>&lt;/td>
&lt;td>When a referenced foreign key is deleted or updated, the columns of all rows referencing that key are set to the default value for that column. &lt;br/>&lt;br/> If the default value for the column is null, or if no default value is provided and the column does not have a &lt;a href='https://siqi-zheng.rbind.io/docs/v21.1/not-null'>&lt;code>NOT NULL&lt;/code>&lt;/a> constraint, this will have the same effect as &lt;code>ON DELETE SET NULL&lt;/code> or &lt;code>ON UPDATE SET NULL&lt;/code>. The default value must still conform with all other constraints, such as &lt;code>UNIQUE&lt;/code>.&lt;/td>
&lt;/tr>
&lt;/tbody>&lt;/table></description></item><item><title>Learning SQL Notes #10: Transactions</title><link>https://siqi-zheng.rbind.io/post/2021-06-08-sql-notes-10/</link><pubDate>Tue, 08 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-08-sql-notes-10/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#multiuser-databases">Multiuser Databases&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#locking">Locking&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#lock-granularities">Lock Granularities&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#what-is-a-transaction">What Is a Transaction?&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#starting-a-transaction">Starting a Transaction&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#ending-a-transaction">Ending a Transaction&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#transaction-savepoints">Transaction Savepoints&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#choosing-a-storage-engine">Choosing a Storage Engine&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Transactions: Mechanism used to group a set of SQL statements together such that either all or none of the statements succeed.&lt;/p>
&lt;h1 id="multiuser-databases">Multiuser Databases&lt;/h1>
&lt;h2 id="locking">Locking&lt;/h2>
&lt;p>Locks are the mechanism the database server uses to &lt;strong>control simultaneous use&lt;/strong> of data resources. When some portion of the database is locked, any other users wishing to modify (or possibly read) that data must wait until the lock has been released. Most database servers use one of two locking strategies:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Database writers must request and receive from the server a write lock to modify data, and database readers must request and receive from the server a read lock to query data. While multiple users can read data simultaneously, only one write lock is given out at a time for each table (or portion thereof), and read requests are blocked until the write lock is released. $\Rightarrow$ long wait times if there are many concurrent read and write requests. (Microsoft SQL Server/MySQL)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Database writers must request and receive from the server a write lock to modify data, but readers do not need any type of lock to query data. Instead, the server ensures that a reader sees a consistent view of the data (the data seems the same even though other users may be making modifications) from the time her query begins until her query has finished. This approach is known as &lt;em>versioning&lt;/em>. $\Rightarrow$ problematic if there are long-running queries while data is being modified. (Oracle Database/MySQL)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="lock-granularities">Lock Granularities&lt;/h2>
&lt;p>&lt;em>Table locks&lt;/em> $\Rightarrow$ less bookkeeping, longer waiting time
Keep multiple users from modifying data in the same table simultaneously&lt;/p>
&lt;p>&lt;em>Page locks&lt;/em>
Keep multiple users from modifying data on the same page (a page is a segment of memory generally in the range of 2 KB to 16 KB) of a table simultaneously&lt;/p>
&lt;p>&lt;em>Row locks&lt;/em> $\Rightarrow$ More bookkeeping, shorter waiting time
Keep multiple users from modifying the same row in a table simultaneously&lt;/p>
&lt;p>SQL Server will, under certain circumstances, &lt;em>escalate&lt;/em> locks from row to page, and from page to table, whereas Oracle Database will never escalate locks.&lt;/p>
&lt;h1 id="what-is-a-transaction">What Is a Transaction?&lt;/h1>
&lt;p>Problems occur when one of the ideal situations fails:&lt;/p>
&lt;ul>
&lt;li>Database servers do not enjoy 100% uptime&lt;/li>
&lt;li>Users do not always allow programs to finish executing&lt;/li>
&lt;li>Applications do not always complete without encountering fatal errors that halt execution&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Transaction&lt;/em> is a device for grouping together multiple SQL statements such that either all or none of the statements succeed (a property known as atomicity).&lt;/p>
&lt;p>Ex:&lt;/p>
&lt;p>If you attempt to transfer $500 from your savings account to your checking account, you would be a bit upset if the money were successfully withdrawn from your savings account but never made it to your checking account. Whatever the reason for the failure (the server was shut down for maintenance, the request for a page lock on the account table timed out, etc.), you want your $500 back. To protect against this kind of error, the program that handles your transfer request would first &lt;strong>begin a transaction&lt;/strong>, then issue the SQL statements needed to move the money from your savings to your checking account, and, &lt;strong>if everything succeeds&lt;/strong>, end the transaction by issuing the &lt;strong>commit&lt;/strong> command. If something &lt;strong>unexpected&lt;/strong> &lt;strong>happens&lt;/strong>, however, the program would issue a &lt;strong>rollback&lt;/strong> command, which instructs the server to undo all changes made since the transaction began.&lt;/p>
&lt;h2 id="starting-a-transaction">Starting a Transaction&lt;/h2>
&lt;p>Database servers handle transaction creation in one of two ways:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>An active transaction is always associated with a database session, so there is no need or method to explicitly begin a transaction. When the current transaction ends, the server automatically begins a new transaction for your session. &lt;em>You can undo some changes.&lt;/em> (Oracle Database)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Unless you explicitly begin a transaction, individual SQL statements are automatically committed independently of one another. To begin a transaction, you must first issue a command. (Microsoft SQL Server/MySQL)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The SQL:2003 standard includes a &lt;code>start transaction&lt;/code> command to be used when you want to explicitly begin a transaction. While MySQL conforms to the standard, SQL Server users must instead issue the command &lt;code>begin transaction&lt;/code>. With both servers, until you explicitly begin a transaction, you are in what is known as &lt;em>autocommit mode&lt;/em>, which means that individual statements are automatically committed by the server.&lt;/p>
&lt;p>A word of advice: shut off autocommit mode each time you log in, and get in the habit of running all of your SQL statements within a transaction.&lt;/p>
&lt;p>Both MySQL and SQL Server allow you to turn off autocommit mode for individual sessions, in which case the servers will act just like Oracle Database regarding transactions. With SQL Server, you issue the following command to disable autocommit mode:&lt;/p>
&lt;p>&lt;code>SET IMPLICIT_TRANSACTIONS ON&lt;/code>&lt;/p>
&lt;p>MySQL allows you to disable autocommit mode via the following:&lt;/p>
&lt;p>&lt;code>SET AUTOCOMMIT=0&lt;/code>&lt;/p>
&lt;p>Once you have left autocommit mode, all SQL commands take place within the scope of a transaction and must be explicitly committed or rolled back.&lt;/p>
&lt;h2 id="ending-a-transaction">Ending a Transaction&lt;/h2>
&lt;p>End with &lt;code>commit&lt;/code> if yes and &lt;code>rollback&lt;/code> if no.&lt;/p>
&lt;p>Some scenarios in practice:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The server shuts down, in which case your transaction will be rolled back automatically when the server is restarted. ✔&lt;/p>
&lt;/li>
&lt;li>
&lt;p>You issue an SQL schema statement, such as alter table, which will cause the current transaction to be committed and a new transaction to be started.&lt;/p>
&lt;ul>
&lt;li>be careful that the state‐ ments that comprise a unit of work are not inadvertently broken up into multiple transactions by the server！&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>You issue another start transaction command, which will cause the previous transaction to be committed. ✔&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The server prematurely ends your transaction because the server detects a dead‐ lock and decides that your transaction is the culprit. In this case, the transaction
will be rolled back, and you will receive an error message.&lt;/p>
&lt;ul>
&lt;li>Most of the time, the terminated transaction can be restarted and will succeed without encountering another deadlock situation.&lt;br>
&lt;code>Message: Deadlock found when trying to get lock; try restarting transaction&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="transaction-savepoints">Transaction Savepoints&lt;/h2>
&lt;p>You may not want to undo &lt;em>all&lt;/em> of the work that has transpired. For these situations, you can establish one or more &lt;em>savepoints&lt;/em>&lt;/p>
&lt;pre>&lt;code class="language-sql">SAVEPOINT my_savepoint;
&lt;/code>&lt;/pre>
&lt;p>within a transaction and use them to roll back to a particular location within your transaction&lt;/p>
&lt;pre>&lt;code class="language-sql">ROLLBACK TO SAVEPOINT my_savepoint;
&lt;/code>&lt;/pre>
&lt;p>rather than rolling all the way back to the start of the transaction.&lt;/p>
&lt;h3 id="choosing-a-storage-engine">Choosing a Storage Engine&lt;/h3>
&lt;p>When using Oracle Database or Microsoft SQL Server, a single set of code is respon‐ sible for low-level database operations, such as retrieving a particular row from a table based on primary key value. The MySQL server, however, has been designed so that multiple storage engines may be utilized to provide low-level database functionality, including resource locking and transaction management. As of version 8.0, MySQL includes the following storage engines:&lt;/p>
&lt;p>&lt;em>MyISAM&lt;/em>
A nontransactional engine employing table locking&lt;/p>
&lt;p>&lt;em>MEMORY&lt;/em>
A nontransactional engine used for in-memory tables&lt;/p>
&lt;p>&lt;em>CSV&lt;/em>
A transactional engine that stores data in comma-separated files&lt;/p>
&lt;p>&lt;em>InnoDB&lt;/em>
A transactional engine employing row-level locking&lt;/p>
&lt;p>&lt;em>Merge&lt;/em>
A specialty engine used to make multiple identical &lt;em>MyISAM&lt;/em> tables appear as a single table (a.k.a. table partitioning)&lt;/p>
&lt;p>&lt;em>Archive&lt;/em>
A specialty engine used to store large amounts of unindexed data, mainly for archival purposes&lt;/p>
&lt;p>MySQL is flexible enough to allow you to choose a storage engine on a table-by-table basis.&lt;/p>
&lt;p>You may explicitly specify a storage engine when creating a table, or you can change an existing table to use a different engine.&lt;/p>
&lt;pre>&lt;code class="language-sql">show table status like 'customer' \G;
/*Second row: Engine: InnoDB*/
ALTER TABLE customer ENGINE = INNODB;
&lt;/code>&lt;/pre>
&lt;p>One example is shown below:&lt;/p>
&lt;pre>&lt;code class="language-sql">START TRANSACTION;
UPDATE product
SET date_retired = CURRENT_TIMESTAMP()
WHERE product_cd = 'XYZ';
SAVEPOINT before_close_accounts;
UPDATE account
SET status = 'CLOSED', close_date = CURRENT_TIMESTAMP(), last_activity_date = CURRENT_TIMESTAMP()
WHERE product_cd = 'XYZ';
ROLLBACK TO SAVEPOINT before_close_accounts;
COMMIT;
/*The net effect of this transaction is that the mythical XYZ product is retired but none of the accounts are closed.*/
&lt;/code>&lt;/pre>
&lt;p>When using savepoints, remember the following:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Despite the name, nothing is saved when you create a savepoint. You must even‐ tually issue a commit if you want your transaction to be made permanent.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If you issue a rollback without naming a savepoint, all savepoints within the transaction will be ignored, and the entire transaction will be undone.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>If you are using &lt;em>SQL Server&lt;/em>, you will need to use the proprietary command &lt;code>save transaction&lt;/code> to create a savepoint and &lt;code>rollback transaction&lt;/code> to roll back to a savepoint, with each command being followed by the savepoint name.&lt;/p></description></item><item><title>Learning SQL Notes #9: Conditional Logic</title><link>https://siqi-zheng.rbind.io/post/2021-06-07-sql-notes-9/</link><pubDate>Mon, 07 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-07-sql-notes-9/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#what-is-conditional-logic">What Is Conditional Logic?&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#the-case-expression">The case Expression&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#searched-case-expressions">Searched case Expressions&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#simple-case-expressions-a-less-flexible-ver-of-the-previous-expression">Simple case Expressions (A less flexible ver. of the previous expression)&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#examples-of-case-expressions">Examples of case Expressions&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#result-set-transformations">Result Set Transformations&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#checking-for-existence">Checking for Existence&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#avoid-division-by-zero-errors">(Avoid) Division-by-Zero Errors&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#conditional-updates">Conditional Updates&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#handling-null-values">Handling Null Values&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h1 id="what-is-conditional-logic">What Is Conditional Logic?&lt;/h1>
&lt;p>Conditional logic is simply the ability to take one of several paths during program execution.&lt;/p>
&lt;p>Analogous to if-else in Python and R.&lt;/p>
&lt;PRE>
SELECT first_name, last_name,
&lt;B>CASE&lt;/B>
WHEN active = 1 THEN 'ACTIVE'
ELSE 'INACT
&lt;B>END&lt;/B> activity_type
FROM customer;
&lt;/PRE>
&lt;h2 id="the-case-expression">The case Expression&lt;/h2>
&lt;ul>
&lt;li>The case expression is part of the SQL standard (SQL92 release) and has been implemented by Oracle Database, SQL Server, MySQL, PostgreSQL, IBM UDB, and others.&lt;/li>
&lt;li>case expressions are built into the SQL grammar and can be included in select, insert, update, and delete statements.&lt;/li>
&lt;/ul>
&lt;h3 id="searched-case-expressions">Searched case Expressions&lt;/h3>
&lt;pre>&lt;code class="language-sql">CASE
WHEN category.name IN ('Children','Family','Sports','Animation')
THEN 'All Ages'
WHEN category.name = 'Horror'
THEN 'Adult'
WHEN category.name IN ('Music','Games')
THEN 'Teens'
ELSE 'Other'
END
&lt;/code>&lt;/pre>
&lt;PRE>
SELECT c.first_name, c.last_name,
CASE
WHEN active = 0 THEN 0
&lt;B>ELSE
(SELECT count(*) FROM rental r
WHERE r.customer_id = c.customer_id)&lt;/B>
END num_rentals /*Create new variables*/
FROM customer c;
&lt;/PRE>
&lt;h3 id="simple-case-expressions-a-less-flexible-ver-of-the-previous-expression">Simple case Expressions (A less flexible ver. of the previous expression)&lt;/h3>
&lt;PRE>
CASE &lt;B>V0&lt;/B>
WHEN V1 THEN E1
WHEN V2 THEN E2 ...
WHEN VN THEN EN
[ELSE ED]
END
&lt;/PRE>
&lt;p>V0 represents a value, and the symbols V1, V2, &amp;hellip;, VN rep‐ resent values that are to be compared to V0.&lt;/p>
&lt;h2 id="examples-of-case-expressions">Examples of case Expressions&lt;/h2>
&lt;h3 id="result-set-transformations">Result Set Transformations&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT monthname(rental_date) rental_month,
count(*) num_rentals
FROM rental
WHEN WHERE rental_date BETWEEN '2005-05-01' AND '2005-08-01'
GROUP BY monthname(rental_date);
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">rental_month&lt;/th>
&lt;th align="right">num_rentals&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">May&lt;/td>
&lt;td align="right">1156&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">June&lt;/td>
&lt;td align="right">2311&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">July&lt;/td>
&lt;td align="right">6709&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-sql">SELECT
SUM(CASE WHEN monthname(rental_date) = 'May' THEN 1
ELSE 0 END) May_rentals,
SUM(CASE WHEN monthname(rental_date) = 'June' THEN 1
ELSE 0 END) June_rentals,
SUM(CASE WHEN monthname(rental_date) = 'July' THEN 1
ELSE 0 END) July_rentals
FROM rental
WHERE rental_date BETWEEN '2005-05-01' AND '2005-08-01';
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">May_rentals&lt;/th>
&lt;th>June_rentals&lt;/th>
&lt;th align="right">July_rentals&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">1156&lt;/td>
&lt;td>2311&lt;/td>
&lt;td align="right">6709&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>When the monthname() function returns the desired value for that column, the case expression returns the value 1; otherwise, it returns a 0. When summed over all rows, each column returns the number of accounts opened for that month. Obviously, such transformations are practical for only a small number of values&lt;/p>
&lt;h3 id="checking-for-existence">Checking for Existence&lt;/h3>
&lt;p>Sometimes you will want to determine whether a relationship exists between two entities &lt;strong>without regard for the quantity&lt;/strong>.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT a.first_name, a.last_name,
CASE
WHEN EXISTS (SELECT 1 FROM film_actor fa
INNER JOIN film f ON fa.film_id = f.film_id
WHERE fa.actor_id = a.actor_id
AND f.rating = 'G') THEN 'Y'
ELSE 'N'
END g_actor
FROM actor a
WHERE a.last_name LIKE 'S%' OR a.first_name LIKE 'S%';
&lt;/code>&lt;/pre>
&lt;h3 id="avoid-division-by-zero-errors">(Avoid) Division-by-Zero Errors&lt;/h3>
&lt;pre>&lt;code class="language-sql">...
sum(p.amount) /
CASE WHEN count(p.amount) = 0 THEN 1
ELSE count(p.amount)
END avg_payment
...
&lt;/code>&lt;/pre>
&lt;h3 id="conditional-updates">Conditional Updates&lt;/h3>
&lt;pre>&lt;code class="language-sql">UPDATE customer
SET active =
CASE
WHEN 90 &amp;lt;= (SELECT datediff(now(), max(rental_date))
FROM rental r
WHERE r.customer_id = customer.customer_id)
THEN 0
ELSE 1
END
WHERE active = 1;
/*if the number returned by the subquery is 90 or higher, the customer is marked as inactive.*/
&lt;/code>&lt;/pre>
&lt;h3 id="handling-null-values">Handling Null Values&lt;/h3>
&lt;pre>&lt;code class="language-sql">...
CASE
WHEN a.address IS NULL THEN 'Unknown'
ELSE a.address
END address,
...
&lt;/code>&lt;/pre>
&lt;p>Note: For calculations, null values often cause a null result. When performing calculations, case expressions are useful for translating a null value into a number (usually 0 or 1) that will allow the calculation to yield a non-null value.&lt;/p></description></item><item><title>Learning SQL Notes #8: Subqueries</title><link>https://siqi-zheng.rbind.io/post/2021-06-06-sql-notes-8/</link><pubDate>Sun, 06 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-06-sql-notes-8/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#what-is-a-subquery">What Is a Subquery?&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#subquery-types">Subquery Types&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#noncorrelated-subqueries">Noncorrelated Subqueries&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#multiple-row-single-column-subqueries">Multiple-Row, Single-Column Subqueries&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#the-in-and-not-in-operators">The in and not in operators&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#the-all-operator">The all operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#the-any-operator-or">The any operator (OR)&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#multicolumn-subqueries">Multicolumn Subqueries&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#correlated-subqueries">Correlated Subqueries&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#the-exists-operator">The exists Operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#data-manipulation-using-correlated-subqueries">Data Manipulation Using Correlated Subqueries&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#when-to-use-subqueries">When to Use Subqueries&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#subqueries-as-data-sources">Subqueries as Data Sources&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#data-fabrication">Data fabrication&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#task-oriented-subqueries">Task-oriented subqueries&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#common-table-expressions">Common table expressions&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#subqueries-as-expression-generators">Subqueries as Expression Generators&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#subquery-wrap-up">Subquery Wrap-Up&lt;/a>&lt;/li>
&lt;/ul>
&lt;h1 id="what-is-a-subquery">What Is a Subquery?&lt;/h1>
&lt;p>A &lt;em>subquery&lt;/em> is a query contained within another SQL statement (which I refer to as the containing statement for the rest of this discussion). A subquery is always enclosed within parentheses, and it is usually executed prior to the containing statement. Like any query, a subquery returns a result set that may consist of:&lt;/p>
&lt;ul>
&lt;li>A single row with a single column&lt;/li>
&lt;li>Multiple rows with a single column&lt;/li>
&lt;li>Multiple rows having multiple columns&lt;/li>
&lt;/ul>
&lt;pre>
SELECT customer_id, first_name, last_name
FROM customer
WHERE customer_id = &lt;b>(SELECT MAX(customer_id) FROM customer);&lt;/b>
&lt;/pre>
&lt;h1 id="subquery-types">Subquery Types&lt;/h1>
&lt;h2 id="noncorrelated-subqueries">Noncorrelated Subqueries&lt;/h2>
&lt;h3 id="multiple-row-single-column-subqueries">Multiple-Row, Single-Column Subqueries&lt;/h3>
&lt;h4 id="the-in-and-not-in-operators">The in and not in operators&lt;/h4>
&lt;pre>
SELECT city_id, city
FROM city
WHERE country_id &lt;> &lt;b>(SELECT country_id FROM country WHERE country = 'India');&lt;/b>
&lt;/pre>
&lt;p>Note: Subquery should not return more than one row when you use &lt;code>WHERE&lt;/code> to filter a condition with inequality/equality in this case.&lt;/p>
&lt;p>What you can do is use the following subqueries:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT country_id
FROM country
WHERE country IN ('Canada','Mexico');
&lt;/code>&lt;/pre>
&lt;p>or&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT country_id
FROM country
WHERE country = 'Canada' OR country = 'Mexico';
&lt;/code>&lt;/pre>
&lt;p>in the following ways:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT city_id, city
FROM city
WHERE country_id IN
(SELECT country_id
FROM country
WHERE country IN ('Canada','Mexico'));
&lt;/code>&lt;/pre>
&lt;p>or the opposite:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT city_id, city
FROM city
WHERE country_id NOT IN
(SELECT country_id
FROM country
WHERE country IN ('Canada','Mexico'));
&lt;/code>&lt;/pre>
&lt;h4 id="the-all-operator">The all operator&lt;/h4>
&lt;p>The all operator allows you to make comparisons between a single value and every value in a set:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT first_name, last_name
FROM customer
WHERE customer_id &amp;lt;&amp;gt; ALL
(SELECT customer_id
FROM payment
WHERE amount = 0);
&lt;/code>&lt;/pre>
&lt;p>or the equivalent:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT first_name, last_name
FROM customer
WHERE customer_id NOT IN
(SELECT customer_id
FROM payment
WHERE amount = 0);
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Any attempt to equate a value to null yields unknown, so when using &lt;code>not in&lt;/code> or &lt;code>&amp;lt;&amp;gt; all&lt;/code> to compare a value to a set of values, you must be careful to ensure that the set of values does not contain a null value.&lt;/strong>&lt;/p>
&lt;p>The subquery in this example returns the total number of film rentals for all custom‐ ers in North America, and the containing query returns all customers whose total
number of film rentals exceeds any of the North American customers.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT customer_id, count(*)
FROM rental
GROUP BY customer_id
HAVING count(*) &amp;gt; ALL
(SELECT count(*)
FROM rental r
INNER JOIN customer c
ON r.customer_id = c.customer_id
INNER JOIN address a
ON c.address_id = a.address_id
INNER JOIN city ct
ON a.city_id = ct.city_id
INNER JOIN country co
ON ct.country_id = co.country_id
WHERE co.country IN ('United States','Mexico','Canada')
GROUP BY r.customer_id
);
&lt;/code>&lt;/pre>
&lt;h4 id="the-any-operator-or">The any operator (OR)&lt;/h4>
&lt;p>A condition using the any operator evaluates to true as soon as a single comparison is favorable.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT customer_id, sum(amount)
FROM payment
GROUP BY customer_id
HAVING sum(amount) &amp;gt; ANY
(SELECT sum(amount)
FROM payment p
INNER JOIN customer c
ON r.customer_id = c.customer_id
INNER JOIN address a
ON c.address_id = a.address_id
INNER JOIN city ct
ON a.city_id = ct.city_id
INNER JOIN country co
ON ct.country_id = co.country_id
WHERE co.country IN ('Bolivia','Paraguay','Chile')
GROUP BY co.country
);
&lt;/code>&lt;/pre>
&lt;h3 id="multicolumn-subqueries">Multicolumn Subqueries&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT actor_id, film_id
FROM film_actor
WHERE (actor_id, film_id) IN
(SELECT a.actor_id, f.film_id
FROM actor a
CROSS JOIN film f
WHERE a.last_name = 'MONROE'
AND f.rating = 'PG');
&lt;/code>&lt;/pre>
&lt;h2 id="correlated-subqueries">Correlated Subqueries&lt;/h2>
&lt;p>A &lt;em>correlated&lt;/em> &lt;em>subquery&lt;/em>, on the other hand, is &lt;em>dependent&lt;/em> on its containing statement from which it references one or more columns.&lt;/p>
&lt;pre>
SELECT c.first_name, c.last_name
FROM customer c
WHERE 20 =
(SELECT count(*)
FROM rental r
WHERE r.customer_id = &lt;b>c.customer_id&lt;/b>);
/*customers who have rented exactly 20 films*/
&lt;/pre>
&lt;h3 id="the-exists-operator">The exists Operator&lt;/h3>
&lt;p>You use the exists operator when you want to identify that a relationship exists without regard for the quantity.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name
FROM customer c
WHERE (NOT) EXISTS
(SELECT r.rental_date, r.customer_id, 'ABCD' str, 2 * 3 / 7 nmbr /*can be replaced by anything*/
FROM rental r
WHERE r.customer_id = c.customer_id
AND date(r.rental_date) &amp;lt; '2005-05-25');
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Since the condition in the containing query only needs to know how many rows have been returned, the actual data the subquery returned is irrelevant.&lt;/strong>&lt;/p>
&lt;h3 id="data-manipulation-using-correlated-subqueries">Data Manipulation Using Correlated Subqueries&lt;/h3>
&lt;pre>&lt;code class="language-sql">UPDATE customer c
SET c.last_update =
(SELECT max(r.rental_date)
FROM rental r
WHERE r.customer_id = c.customer_id);
UPDATE customer c SET c.last_update =
(SELECT max(r.rental_date) FROM rental r WHERE r.customer_id = c.customer_id) WHERE EXISTS
(SELECT 1 FROM rental r
WHERE r.customer_id = c.customer_id);
/*executes only if the condition in the update statement’s where clause evaluates to true (meaning that at least one rental was found for the customer), thus protecting the data in the last_update column from being
overwritten with a null.*/
DELETE FROM customer WHERE 365 &amp;lt; ALL
(SELECT datediff(now(), r.rental_date) days_since_last_rental FROM rental r
WHERE r.customer_id = customer.customer_id);
/*removes rows from the customer table where there have been no film rentals in the past year*/
&lt;/code>&lt;/pre>
&lt;h1 id="when-to-use-subqueries">When to Use Subqueries&lt;/h1>
&lt;h2 id="subqueries-as-data-sources">Subqueries as Data Sources&lt;/h2>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name, pymnt.num_rentals, pymnt.tot_payments
FROM customer c
INNER JOIN
(SELECT customer_id, count(*) num_rentals, sum(amount) tot_payments
FROM payment
GROUP BY customer_id ) pymnt /*execute first*/
ON c.customer_id = pymnt.customer_id;
&lt;/code>&lt;/pre>
&lt;h3 id="data-fabrication">Data fabrication&lt;/h3>
&lt;p>First we have a table for some standards (small/average/heavy) with lower and upper bounds.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT 'Small Fry' name, 0 low_limit, 74.99 high_limit UNION ALL
SELECT 'Average Joes' name, 75 low_limit, 149.99 high_limit
UNION ALL
SELECT 'Heavy Hitters' name, 150 low_limit, 9999999.99 high_limit;
&lt;/code>&lt;/pre>
&lt;p>Then we have transformed the original tables into the desired one.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT pymnt_grps.name, count(*) num_customers
FROM
(SELECT customer_id, count(*) num_rentals, sum(amount) tot_payments
FROM payment
GROUP BY customer_id) pymnt
INNER JOIN (SELECT 'Small Fry' name, 0 low_limit, 74.99 high_limit
UNION ALL
SELECT 'Average Joes' name, 75 low_limit, 149.99 high_limit
UNION ALL
SELECT 'Heavy Hitters' name, 150 low_limit, 9999999.99 high_limit ) pymnt_grps
ON pymnt.tot_payments
BETWEEN pymnt_grps.low_limit AND pymnt_grps.high_limit
GROUP BY pymnt_grps.name;
&lt;/code>&lt;/pre>
&lt;h3 id="task-oriented-subqueries">Task-oriented subqueries&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name, ct.city,
sum(p.amount) tot_payments, count(*) tot_rentals
FROM payment p
INNER JOIN customer c
ON p.customer_id = c.customer_id
INNER JOIN address a
ON c.address_id = a.address_id
INNER JOIN city ct
ON a.city_id = ct.city_id
GROUP BY c.first_name, c.last_name, ct.city;
&lt;/code>&lt;/pre>
&lt;p>We only need names/cities/addresses for display purpose only, so we can use subqueries to group the data first before joining other tables. A more efficient code chunk for the same task：&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name, ct.city, pymnt.tot_payments, pymnt.tot_rentals
FROM (SELECT customer_id, count(*) tot_rentals, sum(amount) tot_payments
FROM payment
GROUP BY customer_id) pymnt
INNER JOIN customer c
ON pymnt.customer_id = c.customer_id
INNER JOIN address a
ON c.address_id = a.address_id
INNER JOIN city ct
ON a.city_id = ct.city_id;
&lt;/code>&lt;/pre>
&lt;h3 id="common-table-expressions">Common table expressions&lt;/h3>
&lt;pre>&lt;code class="language-sql">WITH actors_s AS
(SELECT actor_id, first_name, last_name
FROM actor
WHERE last_name LIKE 'S%'
) /*can be used in the subsequent queries*/
...
&lt;/code>&lt;/pre>
&lt;h2 id="subqueries-as-expression-generators">Subqueries as Expression Generators&lt;/h2>
&lt;p>Correlated scalar subqueries. The customer table is accessed three times (once in each of the three subqueries) rather than just once.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT (SELECT c.first_name
FROM customer c
WHERE c.customer_id = p.customer_id ) first_name, (SELECT c.last_name
FROM customer c
WHERE c.customer_id = p.customer_id ) last_name, (SELECT ct.city
FROM customer c
INNER JOIN address a
ON c.address_id = a.address_id
INNER JOIN city ct
ON a.city_id = ct.city_id
WHERE c.customer_id = p.customer_id
) city,
sum(p.amount) tot_payments, count(*) tot_rentals
FROM payment p
GROUP BY p.customer_id;
&lt;/code>&lt;/pre>
&lt;p>Similarly,&lt;/p>
&lt;pre>&lt;code class="language-sql">INSERT INTO film_actor (actor_id, film_id, last_update) VALUES (
(SELECT actor_id
FROM actor
WHERE first_name = 'JENNIFER' AND last_name = 'DAVIS'), (SELECT film_id FROM film
WHERE title = 'ACE GOLDFINGER'),
now()
);
&lt;/code>&lt;/pre>
&lt;h1 id="subquery-wrap-up">Subquery Wrap-Up&lt;/h1>
&lt;ul>
&lt;li>Return a single column and row, a single column with multiple rows, and multi‐ ple columns and rows&lt;/li>
&lt;li>Are independent of the containing statement (noncorrelated subqueries)&lt;/li>
&lt;li>Reference one or more columns from the containing statement (correlated subqueries)&lt;/li>
&lt;li>Are used in conditions that utilize comparison operators as well as the special-purpose operators in, not in, exists, and not exists&lt;/li>
&lt;li>Can be found in select, update, delete, and insert statements&lt;/li>
&lt;li>Generate result sets that can be joined to other tables (or subqueries) in a query&lt;/li>
&lt;li>Can be used to generate values to populate a table or to populate columns in a query’s result set&lt;/li>
&lt;li>Are used in the select, from, where, having, and order by clauses of queries&lt;/li>
&lt;/ul>
&lt;p>Happy learning!&lt;/p>
&lt;p>&lt;img src="2.jpg" alt="">&lt;/p></description></item><item><title>Learning SQL Notes #7: Grouping and Aggregates (CH. 8)</title><link>https://siqi-zheng.rbind.io/post/2021-06-05-sql-notes-7/</link><pubDate>Sat, 05 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-05-sql-notes-7/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#grouping-concepts">Grouping Concepts&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#aggregate-functions">Aggregate Functions&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#generating-groups">Generating Groups&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#single-columnmulticolumn-grouping">Single-Column/Multicolumn Grouping&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#grouping-via-expressions">Grouping via Expressions&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#generating-rollups">Generating Rollups&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#group-filter-conditions">Group Filter Conditions&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="grouping-concepts">Grouping Concepts&lt;/h2>
&lt;pre>&lt;code class="language-sql">SELECT customer_id, count(*)
FROM rental
GROUP BY customer_id
HAVING count(*) &amp;gt;= 40
ORDER BY 2 DESC;
&lt;/code>&lt;/pre>
&lt;p>WARNING:&lt;/p>
&lt;p>&lt;del>WHERE count(*) &amp;gt;= 40&lt;/del> since aggregate functions should come with &lt;code>HAVING&lt;/code>.&lt;/p>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(tidyverse)
rental %&amp;gt;%
group_by(customer_id) %&amp;gt;%
summarize(counts=n()) %&amp;gt;%
filter(counts&amp;gt;=40) %&amp;gt;%
arrange(desc(counts))
&lt;/code>&lt;/pre>
&lt;h2 id="aggregate-functions">Aggregate Functions&lt;/h2>
&lt;p>Some aggregate functions in SQL/R:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">SQL&lt;/th>
&lt;th align="right">R&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">count()&lt;/td>
&lt;td align="right">count()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">sum()&lt;/td>
&lt;td align="right">sum()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">average()&lt;/td>
&lt;td align="right">mean()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">min()&lt;/td>
&lt;td align="right">min()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">max()&lt;/td>
&lt;td align="right">max()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">group_concat()&lt;/td>
&lt;td align="right">paste()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">first()&lt;/td>
&lt;td align="right">[1]&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">last()&lt;/td>
&lt;td align="right">[-1]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-sql">SELECT COUNT(DISTINCT col1)
FROM string_tbl;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">length(unique(string_tbl$col1))
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>NULLS are ignored unless you use &lt;code>count(*)&lt;/code> where all rows will be counted.&lt;/strong>&lt;/p>
&lt;h2 id="generating-groups">Generating Groups&lt;/h2>
&lt;h3 id="single-columnmulticolumn-grouping">Single-Column/Multicolumn Grouping&lt;/h3>
&lt;p>Grouping can be done on 1 or more columns with aggregate functions.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT actor_id, count(*)
FROM film_actor
GROUP BY actor_id;
SELECT fa.actor_id, f.rating, count(*)
FROM film_actor fa
INNER JOIN film f
ON fa.film_id = f.film_id
GROUP BY fa.actor_id, f.rating
ORDER BY 1,2;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes are analogous to the codes in the last section.&lt;/p>
&lt;h3 id="grouping-via-expressions">Grouping via Expressions&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT extract(YEAR FROM rental_date) year,
COUNT(*) how_many
FROM rental
GROUP BY extract(YEAR FROM rental_date);
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(tidyverse)
rental %&amp;gt;%
mutate(year=year(rental_date)) %&amp;gt;%
group_by(year) %&amp;gt;%
summarize(counts=n()) %&amp;gt;%
&lt;/code>&lt;/pre>
&lt;h3 id="generating-rollups">Generating Rollups&lt;/h3>
&lt;p>Find total counts for each distinct actor.&lt;/p>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
SELECT fa.actor_id, f.rating, count(*)
FROM film_actor fa
INNER JOIN film f
ON fa.film_id = f.film_id
GROUP BY fa.actor_id, f.rating WITH ROLLUP
ORDER BY 1,2;
/*Oracle*/
GROUP BY ROLLUP(fa.actor_id, f.rating)
GROUP BY a, ROLLUP(b, c)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">actor_id&lt;/th>
&lt;th>rating&lt;/th>
&lt;th align="right">count(*)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">NULL&lt;/td>
&lt;td>NULL&lt;/td>
&lt;td align="right">5462&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>NULL&lt;/td>
&lt;td align="right">19&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>G&lt;/td>
&lt;td align="right">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>PG&lt;/td>
&lt;td align="right">6&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>PG-13&lt;/td>
&lt;td align="right">1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>R&lt;/td>
&lt;td align="right">3&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">1&lt;/td>
&lt;td>NC-17&lt;/td>
&lt;td align="right">5&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">2&lt;/td>
&lt;td>NULL&lt;/td>
&lt;td align="right">25&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">2&lt;/td>
&lt;td>G&lt;/td>
&lt;td align="right">7&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(reshape2)
library(zoo)
m &amp;lt;- melt(df, measure.vars = &amp;quot;sales&amp;quot;)
dout &amp;lt;- dcast(m, year + month + region ~ variable, fun.aggregate = sum, margins = &amp;quot;month&amp;quot;)
dout$month &amp;lt;- na.locf(replace(dout$month, dout$month == &amp;quot;(all)&amp;quot;, NA))
&lt;/code>&lt;/pre>
&lt;p>See here: &lt;a href="https://stackoverflow.com/questions/36169073/how-to-do-group-by-rollup-in-r-like-sql">https://stackoverflow.com/questions/36169073/how-to-do-group-by-rollup-in-r-like-sql&lt;/a>&lt;/p>
&lt;h2 id="group-filter-conditions">Group Filter Conditions&lt;/h2>
&lt;ul>
&lt;li>&lt;code>HAVING&lt;/code> with aggregate functions;&lt;/li>
&lt;li>&lt;code>WHERE&lt;/code> with original columns;&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="2.gif" alt="">&lt;/p></description></item><item><title>Learning SQL Notes #6: Data Generation, Manipulation, and Conversion</title><link>https://siqi-zheng.rbind.io/post/2021-06-04-sql-notes-6/</link><pubDate>Fri, 04 Jun 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-04-sql-notes-6/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#working-with-string-data">Working with String Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#string-generation">String Generation&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#including-single-quotes">Including single quotes&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#including-special-characters">Including special characters&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#string-manipulation">String Manipulation&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#string-functions-that-return-numbers">String functions that return numbers&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#working-with-numeric-data">Working with Numeric Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#performing-arithmetic-functions--controlling-number-precision--handling-signed-data">Performing Arithmetic Functions &amp;amp; Controlling Number Precision &amp;amp; Handling Signed Data&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#working-with-temporal-data">Working with Temporal Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#dealing-with-time-zones">Dealing with Time Zones&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#generating-temporal-data">Generating Temporal Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#string-representations-of-temporal-data">String representations of temporal data&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#string-to-date-conversions">String-to-date conversions&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#manipulating-temporal-data">Manipulating Temporal Data&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#temporal-functions-that-return-dates">Temporal functions that return dates&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#temporal-functions-that-return-strings">Temporal functions that return strings&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#temporal-functions-that-return-numbers">Temporal functions that return numbers&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#conversion-functions">Conversion Functions&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#appendix-for-codes">Appendix for Codes&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="working-with-string-data">Working with String Data&lt;/h2>
&lt;h3 id="string-generation">String Generation&lt;/h3>
&lt;p>Types:&lt;/p>
&lt;p>CHAR
Holds fixed-length, blank-padded strings.&lt;/p>
&lt;p>varchar
Holds variable-length strings.&lt;/p>
&lt;p>text (MySQL and SQL Server) or clob (Oracle Database)
Holds very large variable-length strings (generally referred to as documents in this context).&lt;/p>
&lt;pre>&lt;code class="language-sql">CREATE TABLE string_tbl
(char_fld CHAR(30),
vchar_fld VARCHAR(30),
text_fld TEXT
);
INSERT INTO string_tbl (char_fld, vchar_fld, text_fld)
VALUES ('This is char data',
'This is varchar data',
'This is text data');
&lt;/code>&lt;/pre>
&lt;p>If you want to have a longer string, you can&lt;/p>
&lt;pre>&lt;code class="language-sql">UPDATE string_tbl
SET vchar_fld = 'This is a piece of extremely long varchar data';
&lt;/code>&lt;/pre>
&lt;p>but then:&lt;/p>
&lt;pre>&lt;code>ERROR 1406 (22001): Data too long for column 'vchar_fld' at row 1
&lt;/code>&lt;/pre>
&lt;p>NOTE: Since MySQL 6.0, the default behavior is now “strict” mode, which means that exceptions are thrown when problems arise, whereas in older versions of the server &lt;strong>the string would have been truncated and a warning issued&lt;/strong>.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT @@session.sql_mode;
SET sql_mode='ansi'; /*Go back to the older ver.*/
SELECT @@session.sql_mode;
&lt;/code>&lt;/pre>
&lt;p>Now extra will be truncated.&lt;/p>
&lt;h4 id="including-single-quotes">Including single quotes&lt;/h4>
&lt;pre>&lt;code class="language-sql">SELECT quote(text_fld)
FROM string_tbl;
&lt;/code>&lt;/pre>
&lt;p>Output:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">QUOTE(text_fld)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">&amp;lsquo;This string didn't work, but it does now&amp;rsquo;&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="including-special-characters">Including special characters&lt;/h4>
&lt;p>The SQL Server and MySQL servers include the built-in function &lt;code>char()&lt;/code> so that you can build strings from any of the 255 characters in the ASCII character set (Oracle Database users can use the &lt;code>chr()&lt;/code> function).&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT CHAR(128,129,130,131,132,133,134,135,136,137);
&lt;/code>&lt;/pre>
&lt;p>Output:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">CHAR(128,129,130,131,132,133,134,135,136,137)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">Çüéâäàåçêë&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">coderange &amp;lt;- c(128,129,130,131,132,133,134,135,136,137)
rawToChar(as.raw(coderange),multiple=TRUE)
&lt;/code>&lt;/pre>
&lt;p>You can also concatenate two strings:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT CONCAT('danke sch', CHAR(148), 'n');
&lt;/code>&lt;/pre>
&lt;p>Output:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">CONCAT(&amp;lsquo;danke sch&amp;rsquo;, CHAR(148), &amp;lsquo;n&amp;rsquo;)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">danke schön&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">paste('danke sch', rawToChar(as.raw(148)), 'n')
paste0()
&lt;/code>&lt;/pre>
&lt;p>See: &lt;a href="https://www.r-bloggers.com/2011/03/ascii-code-table-in-r/">https://www.r-bloggers.com/2011/03/ascii-code-table-in-r/&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Oracle Database/PostgreSQL users can use the concatenation operator (&lt;code>||&lt;/code>) instead of the &lt;code>concat()&lt;/code> function, as in:&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-sql">SELECT 'danke sch' || CHR(148) || 'n' FROM dual;
&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>SQL Server does not include a &lt;code>concat()&lt;/code> function, so you will need to use the concatenation operator (+), as in:&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-sql">SELECT 'danke sch' + CHAR(148) + 'n'
&lt;/code>&lt;/pre>
&lt;h3 id="string-manipulation">String Manipulation&lt;/h3>
&lt;h4 id="string-functions-that-return-numbers">String functions that return numbers&lt;/h4>
&lt;p>To find the length of a string:&lt;/p>
&lt;pre>&lt;code class="language-sql">LENGTH()
SELECT LENGTH(char_fld) char_length,
LENGTH(vchar_fld) varchar_length,
LENGTH(text_fld) text_length
FROM string_tbl;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">length()
&lt;/code>&lt;/pre>
&lt;p>To find the index of a character in a string:&lt;/p>
&lt;pre>&lt;code class="language-sql">POSITION()
SELECT POSITION('characters' IN vchar_fld)
FROM string_tbl;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">match('y',x)
which('y' %in% x)
&lt;/code>&lt;/pre>
&lt;p>Note: When working with databases that the &lt;strong>first&lt;/strong> character in a string is at position &lt;strong>1&lt;/strong>. A return value of &lt;strong>0&lt;/strong> from &lt;code>instr()&lt;/code> indicates that the substring &lt;strong>could not be found&lt;/strong>, not that the substring was found at the first position in the string.&lt;/p>
&lt;p>If you want to start your search at something &lt;strong>other than the first character&lt;/strong> of your target string, you will need to use the &lt;code>locate()&lt;/code> function, which is similar to the &lt;code>position()&lt;/code> function except that it allows an optional &lt;strong>third parameter&lt;/strong>, which is used to define the search’s start position. The &lt;code>locate()&lt;/code> function is also proprietary, whereas the &lt;code>position()&lt;/code> function is part of the SQL:2003 standard.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT LOCATE('is', vchar_fld, 5)
FROM string_tbl;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">match('y',x[5:])
which('y' %in% x[5:])
&lt;/code>&lt;/pre>
&lt;p>Oracle Database
&lt;code>instr()&lt;/code>: Mimics the &lt;code>position()&lt;/code> function when provided with two arguments and mimics the &lt;code>locate()&lt;/code> function when provided with three arguments.&lt;/p>
&lt;p>SQL Server
&lt;code>charindx()&lt;/code>: similar to Oracle’s &lt;code>instr()&lt;/code> function.&lt;/p>
&lt;p>&lt;code>strcmp()&lt;/code> (MySQL ONLY) takes two strings as arguments and returns one of the following:&lt;/p>
&lt;ul>
&lt;li>−1 if the first string comes before the second string in sort order&lt;/li>
&lt;li>0 if the strings are identical&lt;/li>
&lt;li>1 if the first string comes after the second string in sort order&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-sql">SELECT vchar_fld
FROM string_tbl
ORDER BY vchar_fld;
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">vchar_fld&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">12345&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">abcd&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">QRSTUV&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">qrstuv&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">xyz&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-sql">SELECT STRCMP('12345','12345') 12345_12345,
STRCMP('abcd','xyz') abcd_xyz,
STRCMP('abcd','QRSTUV') abcd_QRSTUV,
STRCMP('qrstuv','QRSTUV') qrstuv_QRSTUV, /*Case insensitive*/
STRCMP('12345','xyz') 12345_xyz,
STRCMP('xyz','qrstuv') xyz_qrstuv;
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">12345_12345&lt;/th>
&lt;th>abcd_xyz&lt;/th>
&lt;th>abcd_QRSTUV&lt;/th>
&lt;th>qrstuv_QRSTUV&lt;/th>
&lt;th>12345_xyz&lt;/th>
&lt;th align="right">xyz_qrstuv&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">0&lt;/td>
&lt;td>−1&lt;/td>
&lt;td>−1&lt;/td>
&lt;td>0&lt;/td>
&lt;td>−1&lt;/td>
&lt;td align="right">1&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Add or replace characters in the &lt;em>middle&lt;/em> of a string：
&lt;code>insert()&lt;/code>
4 parameters: the original string, the start position, the number of characters to replace (0 for inserting a string), and the replacement string.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT INSERT('goodbye world', 9, 0, 'cruel ') string;
/*goodbye cruel world*/
SELECT INSERT('goodbye world', 1, 7, 'hello') string;
/*hello world*/
SELECT SUBSTRING('goodbye cruel world', 9, 5);
/*cruel*/
&lt;/code>&lt;/pre>
&lt;p>For other SQL,&lt;/p>
&lt;pre>&lt;code class="language-sql">/*Oracle*/
SELECT REPLACE('goodbye world', 'goodbye', 'hello') FROM dual;
/*hello world*/
SELECT substr('goodbye cruel world', 9, 5);
/*cruel*/
/*SQL Server*/
SELECT STUFF('hello world', 1, 5, 'goodbye cruel')
/*goodbye cruel world*/
SELECT SUBSTRING('goodbye cruel world', 9, 5);
/*cruel*/
&lt;/code>&lt;/pre>
&lt;h2 id="working-with-numeric-data">Working with Numeric Data&lt;/h2>
&lt;pre>&lt;code class="language-sql">SELECT (37 * 59) / (78 - (8 * 6));
&lt;/code>&lt;/pre>
&lt;h3 id="performing-arithmetic-functions--controlling-number-precision--handling-signed-data">Performing Arithmetic Functions &amp;amp; Controlling Number Precision &amp;amp; Handling Signed Data&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Function name&lt;/th>
&lt;th align="right">Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">acos( x )&lt;/td>
&lt;td align="right">Calculates the arc cosine of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">asin( x )&lt;/td>
&lt;td align="right">Calculates the arc sine of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">atan( x )&lt;/td>
&lt;td align="right">Calculates the arc tangent of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">cos( x )&lt;/td>
&lt;td align="right">Calculates the cosine of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">sin( x )&lt;/td>
&lt;td align="right">Calculates the sine of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">tan( x )&lt;/td>
&lt;td align="right">Calculates the tangent of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">cot( x )&lt;/td>
&lt;td align="right">Calculates the cotangent of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">exp( x )&lt;/td>
&lt;td align="right">Calculates ex&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">ln( x )&lt;/td>
&lt;td align="right">Calculates the natural log of x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">sqrt( x )&lt;/td>
&lt;td align="right">Calculates the square root of x&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Some useful functions in R and SQL (See Appendix for full results):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">SQL&lt;/th>
&lt;th align="right">R&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">MOD( x )&lt;/td>
&lt;td align="right">%%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">POW( x )&lt;/td>
&lt;td align="right">^&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">CEIL( x )&lt;/td>
&lt;td align="right">ceiling()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">FLOOR( x )&lt;/td>
&lt;td align="right">floor()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">ROUND( x )&lt;/td>
&lt;td align="right">round()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">TRUNCATE( x )&lt;/td>
&lt;td align="right">trunc()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">SIGN( x )&lt;/td>
&lt;td align="right">sign()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">ABS( x )&lt;/td>
&lt;td align="right">abs()&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="working-with-temporal-data">Working with Temporal Data&lt;/h2>
&lt;h3 id="dealing-with-time-zones">Dealing with Time Zones&lt;/h3>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
SELECT @@global.time_zone, @@session.time_zone;
SET time_zone = 'Europe/Zurich';
/*Oracle Database*/
ALTER SESSION TIMEZONE = 'Europe/Zurich'
&lt;/code>&lt;/pre>
&lt;p>From:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">@@global.time_zone&lt;/th>
&lt;th align="right">@@session.time_zone&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">SYSTEM&lt;/td>
&lt;td align="right">SYSTEM&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>To:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">@@global.time_zone&lt;/th>
&lt;th align="right">@@session.time_zone&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">SYSTEM&lt;/td>
&lt;td align="right">Europe/Zurich&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">Sys.timezone()
Sys.setenv(TZ = &amp;quot;Europe/Zurich&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="generating-temporal-data">Generating Temporal Data&lt;/h3>
&lt;p>You can generate temporal data via any of the following means:&lt;/p>
&lt;ul>
&lt;li>Copying data from an existing date, datetime, or time column&lt;/li>
&lt;li>Executing a built-in function that returns a date, datetime, or time&lt;/li>
&lt;li>Building a string representation of the temporal data to be evaluated by the server&lt;/li>
&lt;/ul>
&lt;h4 id="string-representations-of-temporal-data">String representations of temporal data&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Component&lt;/th>
&lt;th>Definition&lt;/th>
&lt;th align="right">Range&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">YYYY&lt;/td>
&lt;td>Year, including century&lt;/td>
&lt;td align="right">1000 to 9999&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">MM&lt;/td>
&lt;td>Month&lt;/td>
&lt;td align="right">01 (January) to 12 (December)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">DD&lt;/td>
&lt;td>Day&lt;/td>
&lt;td align="right">01 to 31&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">HH&lt;/td>
&lt;td>Hour&lt;/td>
&lt;td align="right">Range 00 to 23&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">HHH&lt;/td>
&lt;td>Hours&lt;/td>
&lt;td align="right">−838 to 838&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">MI&lt;/td>
&lt;td>(elapsed) Minute&lt;/td>
&lt;td align="right">00 to 59&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">SS&lt;/td>
&lt;td>Second&lt;/td>
&lt;td align="right">00 to 59&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Type&lt;/th>
&lt;th align="right">Default format&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">date&lt;/td>
&lt;td align="right">YYYY-MM-DD&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">datetime&lt;/td>
&lt;td align="right">YYYY-MM-DD HH:MI:SS&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">timestamp&lt;/td>
&lt;td align="right">YYYY-MM-DD HH:MI:SS&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">time&lt;/td>
&lt;td align="right">HHH:MI:SS&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="string-to-date-conversions">String-to-date conversions&lt;/h4>
&lt;ul>
&lt;li>A simple query that returns a datetime value using the &lt;code>cast()&lt;/code> function&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">SQL&lt;/th>
&lt;th align="right">R (lubridate)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">CAST(&amp;lsquo;2019-09-17 15:30:00&amp;rsquo; AS DATETIME)&lt;/td>
&lt;td align="right">as_datetime()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">STR_TO_DATE(&amp;lsquo;September 17, 2019&amp;rsquo;, &amp;lsquo;%M %d, %Y&amp;rsquo;)&lt;/td>
&lt;td align="right">as.Date(&amp;hellip;, format=&amp;hellip;)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">CAST(&amp;lsquo;2019-09-17&amp;rsquo; AS DATE)&lt;/td>
&lt;td align="right">as.Date()&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">CAST(&amp;lsquo;108:17:57&amp;rsquo; AS TIME)&lt;/td>
&lt;td align="right">as.POSIXlt()&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
SELECT str_to_date();
/*Oracle Database*/
SELECT to_date();
/*SQL server*/
SELECT convert();
/*Current System Time*/
SELECT CURRENT_DATE(), CURRENT_TIME(), CURRENT_TIMESTAMP();
&lt;/code>&lt;/pre>
&lt;p>Common notations for both R and SQL:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Format component&lt;/th>
&lt;th align="right">Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">%M&lt;/td>
&lt;td align="right">Month name (January to December)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%m&lt;/td>
&lt;td align="right">Month numeric (01 to 12)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%d&lt;/td>
&lt;td align="right">Day numeric (01 to 31)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%j&lt;/td>
&lt;td align="right">Day of year (001 to 366)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%W&lt;/td>
&lt;td align="right">Weekday name (Sunday to Saturday)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%Y&lt;/td>
&lt;td align="right">Year, four-digit numeric&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%y&lt;/td>
&lt;td align="right">Year, two-digit numeric&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%H&lt;/td>
&lt;td align="right">Hour (00 to 23)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%h&lt;/td>
&lt;td align="right">Hour (01 to 12)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%i&lt;/td>
&lt;td align="right">Minutes (00 to 59)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%s&lt;/td>
&lt;td align="right">Seconds (00 to 59)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%f&lt;/td>
&lt;td align="right">Microseconds (000000 to 999999)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%p&lt;/td>
&lt;td align="right">A.M. or P.M.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="manipulating-temporal-data">Manipulating Temporal Data&lt;/h3>
&lt;p>&lt;strong>Interval types for &lt;code>DATE_ADD()&lt;/code> and &lt;code>EXTRACT()&lt;/code>&lt;/strong>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Interval name&lt;/th>
&lt;th align="right">Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">second&lt;/td>
&lt;td align="right">Number of seconds&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">minute&lt;/td>
&lt;td align="right">Number of minutes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">hour&lt;/td>
&lt;td align="right">Number of hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">day&lt;/td>
&lt;td align="right">Number of days&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">month&lt;/td>
&lt;td align="right">Number of months&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">year&lt;/td>
&lt;td align="right">Number of years&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">minute_second&lt;/td>
&lt;td align="right">Number of minutes and seconds, separated by “:”&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">hour_second&lt;/td>
&lt;td align="right">Number of hours, minutes, and seconds, separated by “:”&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">year_month&lt;/td>
&lt;td align="right">Number of years and months, separated by “-”&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="temporal-functions-that-return-dates">Temporal functions that return dates&lt;/h4>
&lt;p>The same result can be performed on three different servers:&lt;/p>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
UPDATE employee
SET birth_date = DATE_ADD(birth_date, INTERVAL '9-11' YEAR_MONTH)
WHERE emp_id = 4789;
/*Oracle Database*/
UPDATE employee
SET birth_date = ADD_MONTHS(birth_date, 119)
WHERE emp_id = 4789;
/*SQL server*/
UPDATE employee
SET birth_date = DATEADD(MONTH, 119, birth_date)
WHERE emp_id = 4789
&lt;/code>&lt;/pre>
&lt;h4 id="temporal-functions-that-return-strings">Temporal functions that return strings&lt;/h4>
&lt;p>Some other functions for temporal data:&lt;/p>
&lt;pre>&lt;code class="language-sql">/*MySQL*/
SELECT LAST_DAY('2019-09-17'); /*Extract last day of Sept*/
SELECT DAYNAME('2019-09-18'); /*Wednesday*/
SELECT EXTRACT(YEAR FROM '2019-09-18 22:19:05'); /*2019*/
/*SQL Server*/
SELECT DATEPART(YEAR, GETDATE())
&lt;/code>&lt;/pre>
&lt;h4 id="temporal-functions-that-return-numbers">Temporal functions that return numbers&lt;/h4>
&lt;pre>&lt;code class="language-sql">SELECT DATEDIFF('2019-09-03', '2019-06-21');
/*74*/
SELECT DATEDIFF('2019-09-03 23:59:59', '2019-06-21 00:00:01');
/*74, time has no effects*/
SELECT DATEDIFF('2019-06-21', '2019-09-03');
/*-74*/
/*SQL Server*/
SELECT DATEDIFF(DAY, '2019-06-21', '2019-09-03')
&lt;/code>&lt;/pre>
&lt;h3 id="conversion-functions">Conversion Functions&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT CAST('1456328' AS SIGNED INTEGER);
/*1456328*/
SELECT CAST('999ABC111' AS UNSIGNED INTEGER);
/*999 with warnings about truncation*/
&lt;/code>&lt;/pre>
&lt;h2 id="appendix-for-codes">Appendix for Codes&lt;/h2>
&lt;pre>&lt;code class="language-sql">SELECT MOD(10,4);
/*2*/
SELECT MOD(20.75,4); /*Real argument*/
/*0.75*/
SELECT POW(2,8);
/*256*/
SELECT CEIL(72.445), FLOOR(72.445);
/*73 72*/
SELECT CEIL(72.000000001), FLOOR(72.999999999);
/*73 72*/
SELECT ROUND(72.49999), ROUND(72.5), ROUND(72.50001);
/*72 73 73*/
SELECT ROUND(72.0909, 1), ROUND(72.0909, 2), ROUND(72.0909, 3);
/*72.1 72.09 72.091*/
SELECT TRUNCATE(72.0909, 1), TRUNCATE(72.0909, 2), TRUNCATE(72.0909, 3);
/*72.0 72.09 72.090*/
/*SQL Server*/
SELECT ROUND(72.0909, 1, 1)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">%%
^
ceiling()
floor()
round()
trunc()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-sql">SELECT account_id, SIGN(balance), ABS(balance)
FROM account;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">sign()
abs()
&lt;/code>&lt;/pre>
&lt;p>Hope I can finish this before July. Stay safe.&lt;/p>
&lt;p>&lt;img src="2.gif" alt="">&lt;/p></description></item><item><title>Learning SQL Notes #5: Querying Multiple Tables (CH. 5)</title><link>https://siqi-zheng.rbind.io/post/2021-06-03-sql-notes-5/</link><pubDate>Thu, 03 Jun 2021 20:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-03-sql-notes-5/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#cross-join-cartesian-product">Cross Join (Cartesian Product)&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#inner-joins">Inner Joins&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#joining-three-or-more-tables">Joining Three or More Tables&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#using-subqueries-as-tables">Using Subqueries as Tables&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#using-the-same-table-twice">Using the Same Table Twice&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#self-joins">Self-Joins&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#outer-joins">Outer Joins&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#three-way-outer-joins">Three-Way Outer Joins&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#natural-joins">Natural Joins&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Join instructs the server to use a column as the &lt;em>transportation&lt;/em> between tables, thus allows columns from both tables to be included in the query’s result set.&lt;/p>
&lt;h2 id="cross-join-cartesian-product">Cross Join (Cartesian Product)&lt;/h2>
&lt;p>If the query didn’t specify how the two tables should be joined, the database server generated the &lt;em>Cartesian
product&lt;/em>, which is &lt;strong>every permutation&lt;/strong> of the two tables.&lt;/p>
&lt;pre>&lt;code class="language-sql">JOIN b
CROSS JOIN b
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">merge(x = df1, y = df2, by = NULL)
library(data.table)
CJ(a, b)
&lt;/code>&lt;/pre>
&lt;p>Can be used to create a list of consecutive numbers.&lt;/p>
&lt;h2 id="inner-joins">Inner Joins&lt;/h2>
&lt;p>If a value exists for the address_id column in one table but &lt;em>not&lt;/em> the other, then the join fails for the rows containing that value, and those rows are &lt;strong>excluded&lt;/strong> from the result set. Inner join only returns rows that satisfy the &lt;strong>join condition&lt;/strong>.&lt;/p>
&lt;pre>
INNER JOIN b
&lt;b>ON a.id=b.id&lt;/b>
&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">merge(df1, df2, by = &amp;quot;id&amp;quot;)
library(plyr)
join(df1, df2,
type = &amp;quot;inner&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h2 id="joining-three-or-more-tables">Joining Three or More Tables&lt;/h2>
&lt;p>Join order is not important!&lt;/p>
&lt;p>Force order:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT STRAIGHT_JOIN COL1
&lt;/code>&lt;/pre>
&lt;h2 id="using-subqueries-as-tables">Using Subqueries as Tables&lt;/h2>
&lt;p>See subquery notes.&lt;/p>
&lt;h2 id="using-the-same-table-twice">Using the Same Table Twice&lt;/h2>
&lt;p>Either one of the actors in the movie:&lt;/p>
&lt;pre>&lt;code class="language-SQL">SELECT f.title
FROM film f
INNER JOIN film_actor fa
ON f.film_id = fa.film_id
INNER JOIN actor a
ON fa.actor_id = a.actor_id
WHERE ((a.first_name = 'CATE' AND a.last_name = 'MCQUEEN')
OR (a.first_name = 'CUBA' AND a.last_name = 'BIRCH');
&lt;/code>&lt;/pre>
&lt;p>If we want movies that have both, you cannot simply replace OR with AND since this will return an empty set. Hence instead, you need to join the table twice:&lt;/p>
&lt;pre>&lt;code class="language-SQL">SELECT f.title
FROM film f
/*once: */
INNER JOIN film_actor fa1
ON f.film_id = fa1.film_id
INNER JOIN actor a1
ON fa1.actor_id = a1.actor_id
/*twice: */
INNER JOIN film_actor fa2
ON f.film_id = fa2.film_id
INNER JOIN actor a2
ON fa2.actor_id = a2.actor_id
/*filter condition is applied*/
WHERE (a1.first_name = 'CATE' AND a1.last_name = 'MCQUEEN')
AND (a2.first_name = 'CUBA' AND a2.last_name = 'BIRCH');
&lt;/code>&lt;/pre>
&lt;h2 id="self-joins">Self-Joins&lt;/h2>
&lt;p>Some tables include a self-referencing foreign key, which means that it includes a column that points to the primary key within the same table.&lt;/p>
&lt;p>Imagine that the film table includes the column prequel_film_id, which points to the film’s parent (e.g., the film Fiddler Lost II would use this column to point to the parent film Fiddler Lost).&lt;/p>
&lt;p>Using a self-join, you can write a query that lists every film that has a prequel, along with the prequel’s title:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT f.title, f_prnt.title prequel
FROM film f
INNER JOIN film f_prnt
ON f_prnt.film_id = f.prequel_film_id
WHERE f.prequel_film_id IS NOT NULL;
&lt;/code>&lt;/pre>
&lt;p>A possible outcome:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">title&lt;/th>
&lt;th align="right">prequel&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">FIDDLER LOST II&lt;/td>
&lt;td align="right">FIDDLER LOST&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="outer-joins">Outer Joins&lt;/h2>
&lt;pre>&lt;code class="language-sql">SELECT f.film_id, f.title, count(i.inventory_id) num_copies
FROM film f
LEFT OUTER JOIN inventory i
ON f.film_id = i.film_id
GROUP BY f.film_id, f.title;
&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>
&lt;p>Left outer join includes all rows from the table on the left side of the join (film, in this case) and then include columns from the table on the right side of the join (inventory) if the join is successful.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The num_copies column definition was changed from count(*) to count(i.inventory_id), which will count the number of non-null values of the inventory.inventory_id column.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A left outer join B $\equiv$ B right outer join A.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="three-way-outer-joins">Three-Way Outer Joins&lt;/h3>
&lt;pre>
SELECT f.film_id, f.title, i.inventory_id, r.rental_date
FROM film f LEFT OUTER JOIN inventory i
ON f.film_id = i.film_id
&lt;b>LEFT OUTER JOIN rental r
ON i.inventory_id = r.inventory_id&lt;/b>
WHERE f.film_id BETWEEN 13 AND 15;
&lt;/pre>
&lt;h2 id="natural-joins">Natural Joins&lt;/h2>
&lt;p>Lets the database server determine what the join conditions need to be.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name, date(r.rental_date)
FROM customer c
NATURAL JOIN rental r;
&lt;/code>&lt;/pre>
&lt;p>Empty set (0.04 sec)&lt;/p>
&lt;p>Because you specified a natural join, the server inspected the table definitions and added the join condition r.customer_id = c.customer_id to join the two tables. This would have worked fine, but in the Sakila schema all of the tables include the column last_update to show when each row was last modified, so the server is also adding the join condition r.last_update = c.last_update, which causes the query to return no data.&lt;/p>
&lt;p>The only way around this issue is to use a subquery to restrict the columns for at least one of the tables:&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT cust.first_name, cust.last_name, date(r.rental_date)
FROM
(SELECT customer_id, first_name, last_name
FROM customer
) cust
NATURAL JOIN rental r;
&lt;/code>&lt;/pre></description></item><item><title>Learning SQL Notes #4.5: Regular Expression</title><link>https://siqi-zheng.rbind.io/post/2021-06-02-sql-notes-4-5/</link><pubDate>Wed, 02 Jun 2021 20:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-06-02-sql-notes-4-5/</guid><description>&lt;p>Adapted from &lt;a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference">https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference&lt;/a>&lt;/p>
&lt;h2 id="character-escapes">Character Escapes&lt;/h2>
&lt;p>The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally. For more information, see &lt;a href="character-escapes-in-regular-expressions" data-linktype="relative-path">Character Escapes&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Escaped character&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>\a&lt;/code>&lt;/td>
&lt;td>Matches a bell character, \u0007.&lt;/td>
&lt;td>&lt;code>\a&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\u0007&amp;quot;&lt;/code> in &lt;code>&amp;quot;Error!&amp;quot; + '\u0007'&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\b&lt;/code>&lt;/td>
&lt;td>In a character class, matches a backspace, \u0008.&lt;/td>
&lt;td>&lt;code>[\b]{3,}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\b\b\b\b&amp;quot;&lt;/code> in &lt;code>&amp;quot;\b\b\b\b&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\t&lt;/code>&lt;/td>
&lt;td>Matches a tab, \u0009.&lt;/td>
&lt;td>&lt;code>(\w+)\t&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;item1\t&amp;quot;&lt;/code>, &lt;code>&amp;quot;item2\t&amp;quot;&lt;/code> in &lt;code>&amp;quot;item1\titem2\t&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\r&lt;/code>&lt;/td>
&lt;td>Matches a carriage return, \u000D. (&lt;code>\r&lt;/code> is not equivalent to the newline character, &lt;code>\n&lt;/code>.)&lt;/td>
&lt;td>&lt;code>\r\n(\w+)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\r\nThese&amp;quot;&lt;/code> in &lt;code>&amp;quot;\r\nThese are\ntwo lines.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\v&lt;/code>&lt;/td>
&lt;td>Matches a vertical tab, \u000B.&lt;/td>
&lt;td>&lt;code>[\v]{2,}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\v\v\v&amp;quot;&lt;/code> in &lt;code>&amp;quot;\v\v\v&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\f&lt;/code>&lt;/td>
&lt;td>Matches a form feed, \u000C.&lt;/td>
&lt;td>&lt;code>[\f]{2,}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\f\f\f&amp;quot;&lt;/code> in &lt;code>&amp;quot;\f\f\f&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\n&lt;/code>&lt;/td>
&lt;td>Matches a new line, \u000A.&lt;/td>
&lt;td>&lt;code>\r\n(\w+)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\r\nThese&amp;quot;&lt;/code> in &lt;code>&amp;quot;\r\nThese are\ntwo lines.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\e&lt;/code>&lt;/td>
&lt;td>Matches an escape, \u001B.&lt;/td>
&lt;td>&lt;code>\e&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\x001B&amp;quot;&lt;/code> in &lt;code>&amp;quot;\x001B&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\&lt;/code> &lt;em>nnn&lt;/em>&lt;/td>
&lt;td>Uses octal representation to specify a character (&lt;em>nnn&lt;/em> consists of two or three digits).&lt;/td>
&lt;td>&lt;code>\w\040\w&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;a b&amp;quot;&lt;/code>, &lt;code>&amp;quot;c d&amp;quot;&lt;/code> in &lt;code>&amp;quot;a bc d&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\x&lt;/code> &lt;em>nn&lt;/em>&lt;/td>
&lt;td>Uses hexadecimal representation to specify a character (&lt;em>nn&lt;/em> consists of exactly two digits).&lt;/td>
&lt;td>&lt;code>\w\x20\w&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;a b&amp;quot;&lt;/code>, &lt;code>&amp;quot;c d&amp;quot;&lt;/code> in &lt;code>&amp;quot;a bc d&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\c&lt;/code> &lt;em>X&lt;/em>&lt;br/>&lt;br/> &lt;code>\c&lt;/code> &lt;em>x&lt;/em>&lt;/td>
&lt;td>Matches the ASCII control character that is specified by &lt;em>X&lt;/em> or &lt;em>x&lt;/em>, where &lt;em>X&lt;/em> or &lt;em>x&lt;/em> is the letter of the control character.&lt;/td>
&lt;td>&lt;code>\cC&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;\x0003&amp;quot;&lt;/code> in &lt;code>&amp;quot;\x0003&amp;quot;&lt;/code> (Ctrl-C)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\u&lt;/code> &lt;em>nnnn&lt;/em>&lt;/td>
&lt;td>Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by &lt;em>nnnn&lt;/em>).&lt;/td>
&lt;td>&lt;code>\w\u0020\w&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;a b&amp;quot;&lt;/code>, &lt;code>&amp;quot;c d&amp;quot;&lt;/code> in &lt;code>&amp;quot;a bc d&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\&lt;/code>&lt;/td>
&lt;td>When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, &lt;code>\*&lt;/code> is the same as &lt;code>\x2A&lt;/code>, and &lt;code>\.&lt;/code> is the same as &lt;code>\x2E&lt;/code>. This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by &lt;code>\*&lt;/code> or &lt;code>\?&lt;/code>).&lt;/td>
&lt;td>&lt;code>\d+[\+-x\*]\d+&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;2+2&amp;quot;&lt;/code> and &lt;code>&amp;quot;3*9&amp;quot;&lt;/code> in &lt;code>&amp;quot;(2+2) * 3*9&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="character-classes">Character Classes&lt;/h2>
&lt;p>A character class matches any one of a set of characters. Character classes include the language elements listed in the following table. For more information, see &lt;a href="character-classes-in-regular-expressions" data-linktype="relative-path">Character Classes&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Character class&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>[&lt;/code> &lt;em>character_group&lt;/em> &lt;code>]&lt;/code>&lt;/td>
&lt;td>Matches any single character in &lt;em>character_group&lt;/em>. By default, the match is case-sensitive.&lt;/td>
&lt;td>&lt;code>[ae]&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;a&amp;quot;&lt;/code> in &lt;code>&amp;quot;gray&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;a&amp;quot;&lt;/code>, &lt;code>&amp;quot;e&amp;quot;&lt;/code> in &lt;code>&amp;quot;lane&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>[^&lt;/code> &lt;em>character_group&lt;/em> &lt;code>]&lt;/code>&lt;/td>
&lt;td>Negation: Matches any single character that is not in &lt;em>character_group&lt;/em>. By default, characters in &lt;em>character_group&lt;/em> are case-sensitive.&lt;/td>
&lt;td>&lt;code>[^aei]&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;r&amp;quot;&lt;/code>, &lt;code>&amp;quot;g&amp;quot;&lt;/code>, &lt;code>&amp;quot;n&amp;quot;&lt;/code> in &lt;code>&amp;quot;reign&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>[&lt;/code> &lt;em>first&lt;/em> &lt;code>-&lt;/code> &lt;em>last&lt;/em> &lt;code>]&lt;/code>&lt;/td>
&lt;td>Character range: Matches any single character in the range from &lt;em>first&lt;/em> to &lt;em>last&lt;/em>.&lt;/td>
&lt;td>&lt;code>[A-Z]&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;A&amp;quot;&lt;/code>, &lt;code>&amp;quot;B&amp;quot;&lt;/code> in &lt;code>&amp;quot;AB123&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>.&lt;/code>&lt;/td>
&lt;td>Wildcard: Matches any single character except \n.&lt;br/>&lt;br/> To match a literal period character (. or &lt;code>\u002E&lt;/code>), you must precede it with the escape character (&lt;code>\.&lt;/code>).&lt;/td>
&lt;td>&lt;code>a.e&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ave&amp;quot;&lt;/code> in &lt;code>&amp;quot;nave&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;ate&amp;quot;&lt;/code> in &lt;code>&amp;quot;water&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\p{&lt;/code> &lt;em>name&lt;/em> &lt;code>}&lt;/code>&lt;/td>
&lt;td>Matches any single character in the Unicode general category or named block specified by &lt;em>name&lt;/em>.&lt;/td>
&lt;td>&lt;code>\p{Lu}&lt;/code>&lt;br/>&lt;br/> &lt;code>\p{IsCyrillic}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;C&amp;quot;&lt;/code>, &lt;code>&amp;quot;L&amp;quot;&lt;/code> in &lt;code>&amp;quot;City Lights&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;Д&amp;quot;&lt;/code>, &lt;code>&amp;quot;Ж&amp;quot;&lt;/code> in &lt;code>&amp;quot;ДЖem&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\P{&lt;/code> &lt;em>name&lt;/em> &lt;code>}&lt;/code>&lt;/td>
&lt;td>Matches any single character that is not in the Unicode general category or named block specified by &lt;em>name&lt;/em>.&lt;/td>
&lt;td>&lt;code>\P{Lu}&lt;/code>&lt;br/>&lt;br/> &lt;code>\P{IsCyrillic}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;i&amp;quot;&lt;/code>, &lt;code>&amp;quot;t&amp;quot;&lt;/code>, &lt;code>&amp;quot;y&amp;quot;&lt;/code> in &lt;code>&amp;quot;City&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;e&amp;quot;&lt;/code>, &lt;code>&amp;quot;m&amp;quot;&lt;/code> in &lt;code>&amp;quot;ДЖem&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\w&lt;/code>&lt;/td>
&lt;td>Matches any word character.&lt;/td>
&lt;td>&lt;code>\w&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;I&amp;quot;&lt;/code>, &lt;code>&amp;quot;D&amp;quot;&lt;/code>, &lt;code>&amp;quot;A&amp;quot;&lt;/code>, &lt;code>&amp;quot;1&amp;quot;&lt;/code>, &lt;code>&amp;quot;3&amp;quot;&lt;/code> in &lt;code>&amp;quot;ID A1.3&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\W&lt;/code>&lt;/td>
&lt;td>Matches any non-word character.&lt;/td>
&lt;td>&lt;code>\W&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot; &amp;quot;&lt;/code>, &lt;code>&amp;quot;.&amp;quot;&lt;/code> in &lt;code>&amp;quot;ID A1.3&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\s&lt;/code>&lt;/td>
&lt;td>Matches any white-space character.&lt;/td>
&lt;td>&lt;code>\w\s&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;D &amp;quot;&lt;/code> in &lt;code>&amp;quot;ID A1.3&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\S&lt;/code>&lt;/td>
&lt;td>Matches any non-white-space character.&lt;/td>
&lt;td>&lt;code>\s\S&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot; _&amp;quot;&lt;/code> in &lt;code>&amp;quot;int __ctr&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\d&lt;/code>&lt;/td>
&lt;td>Matches any decimal digit.&lt;/td>
&lt;td>&lt;code>\d&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;4&amp;quot;&lt;/code> in &lt;code>&amp;quot;4 = IV&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\D&lt;/code>&lt;/td>
&lt;td>Matches any character other than a decimal digit.&lt;/td>
&lt;td>&lt;code>\D&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot; &amp;quot;&lt;/code>, &lt;code>&amp;quot;=&amp;quot;&lt;/code>, &lt;code>&amp;quot; &amp;quot;&lt;/code>, &lt;code>&amp;quot;I&amp;quot;&lt;/code>, &lt;code>&amp;quot;V&amp;quot;&lt;/code> in &lt;code>&amp;quot;4 = IV&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="anchors">Anchors&lt;/h2>
&lt;p>Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters. The metacharacters listed in the following table are anchors. For more information, see &lt;a href="anchors-in-regular-expressions" data-linktype="relative-path">Anchors&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Assertion&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>^&lt;/code>&lt;/td>
&lt;td>By default, the match must start at the beginning of the string; in multiline mode, it must start at the beginning of the line.&lt;/td>
&lt;td>&lt;code>^\d{3}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;901&amp;quot;&lt;/code> in &lt;code>&amp;quot;901-333-&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$&lt;/code>&lt;/td>
&lt;td>By default, the match must occur at the end of the string or before &lt;code>\n&lt;/code> at the end of the string; in multiline mode, it must occur before the end of the line or before &lt;code>\n&lt;/code> at the end of the line.&lt;/td>
&lt;td>&lt;code>-\d{3}$&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;-333&amp;quot;&lt;/code> in &lt;code>&amp;quot;-901-333&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\A&lt;/code>&lt;/td>
&lt;td>The match must occur at the start of the string.&lt;/td>
&lt;td>&lt;code>\A\d{3}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;901&amp;quot;&lt;/code> in &lt;code>&amp;quot;901-333-&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\Z&lt;/code>&lt;/td>
&lt;td>The match must occur at the end of the string or before &lt;code>\n&lt;/code> at the end of the string.&lt;/td>
&lt;td>&lt;code>-\d{3}\Z&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;-333&amp;quot;&lt;/code> in &lt;code>&amp;quot;-901-333&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\z&lt;/code>&lt;/td>
&lt;td>The match must occur at the end of the string.&lt;/td>
&lt;td>&lt;code>-\d{3}\z&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;-333&amp;quot;&lt;/code> in &lt;code>&amp;quot;-901-333&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\G&lt;/code>&lt;/td>
&lt;td>The match must occur at the point where the previous match ended.&lt;/td>
&lt;td>&lt;code>\G\(\d\)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;(1)&amp;quot;&lt;/code>, &lt;code>&amp;quot;(3)&amp;quot;&lt;/code>, &lt;code>&amp;quot;(5)&amp;quot;&lt;/code> in &lt;code>&amp;quot;(1)(3)(5)[7](9)&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\b&lt;/code>&lt;/td>
&lt;td>The match must occur on a boundary between a &lt;code>\w&lt;/code> (alphanumeric) and a &lt;code>\W&lt;/code> (nonalphanumeric) character.&lt;/td>
&lt;td>&lt;code>\b\w+\s\w+\b&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;them theme&amp;quot;&lt;/code>, &lt;code>&amp;quot;them them&amp;quot;&lt;/code> in &lt;code>&amp;quot;them theme them them&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\B&lt;/code>&lt;/td>
&lt;td>The match must not occur on a &lt;code>\b&lt;/code> boundary.&lt;/td>
&lt;td>&lt;code>\Bend\w*\b&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ends&amp;quot;&lt;/code>, &lt;code>&amp;quot;ender&amp;quot;&lt;/code> in &lt;code>&amp;quot;end sends endure lender&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="grouping-constructs">Grouping Constructs&lt;/h2>
&lt;p>Grouping constructs delineate subexpressions of a regular expression and typically capture substrings of an input string. Grouping constructs include the language elements listed in the following table. For more information, see &lt;a href="grouping-constructs-in-regular-expressions" data-linktype="relative-path">Grouping Constructs&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Grouping construct&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>(&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Captures the matched subexpression and assigns it a one-based ordinal number.&lt;/td>
&lt;td>&lt;code>(\w)\1&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ee&amp;quot;&lt;/code> in &lt;code>&amp;quot;deep&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;&lt;/code> &lt;em>name&lt;/em> &lt;code>&amp;gt;&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;br/> or &lt;br/>&lt;code>(?'&lt;/code> &lt;em>name&lt;/em> &lt;code>'&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Captures the matched subexpression into a named group.&lt;/td>
&lt;td>&lt;code>(?&amp;lt;double&amp;gt;\w)\k&amp;lt;double&amp;gt;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ee&amp;quot;&lt;/code> in &lt;code>&amp;quot;deep&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;&lt;/code> &lt;em>name1&lt;/em> &lt;code>-&lt;/code> &lt;em>name2&lt;/em> &lt;code>&amp;gt;&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code> &lt;br/> or &lt;br/> &lt;code>(?'&lt;/code> &lt;em>name1&lt;/em> &lt;code>-&lt;/code> &lt;em>name2&lt;/em> &lt;code>'&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Defines a balancing group definition. For more information, see the &amp;quot;Balancing Group Definition&amp;quot; section in &lt;a href="grouping-constructs-in-regular-expressions" data-linktype="relative-path">Grouping Constructs&lt;/a>.&lt;/td>
&lt;td>&lt;code>(((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;((1-3)*(3-1))&amp;quot;&lt;/code> in &lt;code>&amp;quot;3+2^((1-3)*(3-1))&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?:&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Defines a noncapturing group.&lt;/td>
&lt;td>&lt;code>Write(?:Line)?&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;WriteLine&amp;quot;&lt;/code> in &lt;code>&amp;quot;Console.WriteLine()&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;Write&amp;quot;&lt;/code> in &lt;code>&amp;quot;Console.Write(value)&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?imnsx-imnsx:&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Applies or disables the specified options within &lt;em>subexpression&lt;/em>. For more information, see &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/td>
&lt;td>&lt;code>A\d{2}(?i:\w+)\b&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;A12xl&amp;quot;&lt;/code>, &lt;code>&amp;quot;A12XL&amp;quot;&lt;/code> in &lt;code>&amp;quot;A12xl A12XL a12xl&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?=&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Zero-width positive lookahead assertion.&lt;/td>
&lt;td>&lt;code>\b\w+\b(?=.+and.+)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;cats&amp;quot;&lt;/code>, &lt;code>&amp;quot;dogs&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?!&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Zero-width negative lookahead assertion.&lt;/td>
&lt;td>&lt;code>\b\w+\b(?!.+and.+)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;and&amp;quot;&lt;/code>, &lt;code>&amp;quot;some&amp;quot;&lt;/code>, &lt;code>&amp;quot;mice&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;=&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Zero-width positive lookbehind assertion.&lt;/td>
&lt;td>&lt;code>\b\w+\b(?&amp;lt;=.+and.+)&lt;/code>&lt;br/>&lt;br/>———————————&lt;br/>&lt;br/>&lt;code>\b\w+\b(?&amp;lt;=.+and.*)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;some&amp;quot;&lt;/code>, &lt;code>&amp;quot;mice&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;br/>————————————&lt;br/>&lt;code>&amp;quot;and&amp;quot;&lt;/code>, &lt;code>&amp;quot;some&amp;quot;&lt;/code>, &lt;code>&amp;quot;mice&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;!&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Zero-width negative lookbehind assertion.&lt;/td>
&lt;td>&lt;code>\b\w+\b(?&amp;lt;!.+and.+)&lt;/code>&lt;br/>&lt;br/>———————————&lt;br/>&lt;br/>&lt;code>\b\w+\b(?&amp;lt;!.+and.*)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;cats&amp;quot;&lt;/code>, &lt;code>&amp;quot;dogs&amp;quot;&lt;/code>, &lt;code>&amp;quot;and&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;br/>————————————&lt;br/>&lt;code>&amp;quot;cats&amp;quot;&lt;/code>, &lt;code>&amp;quot;dogs&amp;quot;&lt;/code>&lt;br/>in&lt;br/>&lt;code>&amp;quot;cats, dogs and some mice.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;gt;&lt;/code> &lt;em>subexpression&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Atomic group.&lt;/td>
&lt;td>&lt;code>(?&amp;gt;a|ab)c&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ac&amp;quot;&lt;/code> in&lt;code>&amp;quot;ac&amp;quot;&lt;/code>&lt;br/>&lt;br/>&lt;em>nothing&lt;/em> in&lt;code>&amp;quot;abc&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="lookarounds-at-a-glance">Lookarounds at a glance&lt;/h3>
&lt;p>When the regular expression engine hits a &lt;strong>lookaround expression&lt;/strong>, it takes a substring reaching from the current position to the start (lookbehind) or end (lookahead) of the original string, and then runs
&lt;a href="https://siqi-zheng.rbind.io/en-us/dotnet/api/system.text.regularexpressions.regex.ismatch" data-linktype="absolute-path">Regex.IsMatch&lt;/a> on that substring using the lookaround pattern. Success of this subexpression's result is then determined by whether it's a positive or negative assertion.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Lookaround&lt;/th>
&lt;th>Name&lt;/th>
&lt;th>Function&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>(?=check)&lt;/code>&lt;/td>
&lt;td>Positive Lookahead&lt;/td>
&lt;td>Asserts that what immediately follows the current position in the string is &amp;quot;check&amp;quot;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;=check)&lt;/code>&lt;/td>
&lt;td>Positive Lookbehind&lt;/td>
&lt;td>Asserts that what immediately precedes the current position in the string is &amp;quot;check&amp;quot;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?!check)&lt;/code>&lt;/td>
&lt;td>Negative Lookahead&lt;/td>
&lt;td>Asserts that what immediately follows the current position in the string is not &amp;quot;check&amp;quot;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?&amp;lt;!check)&lt;/code>&lt;/td>
&lt;td>Negative Lookbehind&lt;/td>
&lt;td>Asserts that what immediately precedes the current position in the string is not &amp;quot;check&amp;quot;&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Once they have matched, &lt;strong>atomic groups&lt;/strong> won't be re-evaluated again, even when the remainder of the pattern fails due to the match. This can significantly improve performance when quantifiers occur within the atomic group or the remainder of the pattern.&lt;/p>
&lt;h2 id="quantifiers">Quantifiers&lt;/h2>
&lt;p>A quantifier specifies how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur. Quantifiers include the language elements listed in the following table. For more information, see &lt;a href="quantifiers-in-regular-expressions" data-linktype="relative-path">Quantifiers&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Quantifier&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>*&lt;/code>&lt;/td>
&lt;td>Matches the previous element zero or more times.&lt;/td>
&lt;td>&lt;code>\d*\.\d&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;.0&amp;quot;&lt;/code>, &lt;code>&amp;quot;19.9&amp;quot;&lt;/code>, &lt;code>&amp;quot;219.9&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>+&lt;/code>&lt;/td>
&lt;td>Matches the previous element one or more times.&lt;/td>
&lt;td>&lt;code>&amp;quot;be+&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;bee&amp;quot;&lt;/code> in &lt;code>&amp;quot;been&amp;quot;&lt;/code>, &lt;code>&amp;quot;be&amp;quot;&lt;/code> in &lt;code>&amp;quot;bent&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>?&lt;/code>&lt;/td>
&lt;td>Matches the previous element zero or one time.&lt;/td>
&lt;td>&lt;code>&amp;quot;rai?n&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ran&amp;quot;&lt;/code>, &lt;code>&amp;quot;rain&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>}&lt;/code>&lt;/td>
&lt;td>Matches the previous element exactly &lt;em>n&lt;/em> times.&lt;/td>
&lt;td>&lt;code>&amp;quot;,\d{3}&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;,043&amp;quot;&lt;/code> in &lt;code>&amp;quot;1,043.6&amp;quot;&lt;/code>, &lt;code>&amp;quot;,876&amp;quot;&lt;/code>, &lt;code>&amp;quot;,543&amp;quot;&lt;/code>, and &lt;code>&amp;quot;,210&amp;quot;&lt;/code> in &lt;code>&amp;quot;9,876,543,210&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>,}&lt;/code>&lt;/td>
&lt;td>Matches the previous element at least &lt;em>n&lt;/em> times.&lt;/td>
&lt;td>&lt;code>&amp;quot;\d{2,}&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;166&amp;quot;&lt;/code>, &lt;code>&amp;quot;29&amp;quot;&lt;/code>, &lt;code>&amp;quot;1930&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>,&lt;/code> &lt;em>m&lt;/em> &lt;code>}&lt;/code>&lt;/td>
&lt;td>Matches the previous element at least &lt;em>n&lt;/em> times, but no more than &lt;em>m&lt;/em> times.&lt;/td>
&lt;td>&lt;code>&amp;quot;\d{3,5}&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;166&amp;quot;&lt;/code>, &lt;code>&amp;quot;17668&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;19302&amp;quot;&lt;/code> in &lt;code>&amp;quot;193024&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>*?&lt;/code>&lt;/td>
&lt;td>Matches the previous element zero or more times, but as few times as possible.&lt;/td>
&lt;td>&lt;code>\d*?\.\d&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;.0&amp;quot;&lt;/code>, &lt;code>&amp;quot;19.9&amp;quot;&lt;/code>, &lt;code>&amp;quot;219.9&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>+?&lt;/code>&lt;/td>
&lt;td>Matches the previous element one or more times, but as few times as possible.&lt;/td>
&lt;td>&lt;code>&amp;quot;be+?&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;be&amp;quot;&lt;/code> in &lt;code>&amp;quot;been&amp;quot;&lt;/code>, &lt;code>&amp;quot;be&amp;quot;&lt;/code> in &lt;code>&amp;quot;bent&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>??&lt;/code>&lt;/td>
&lt;td>Matches the previous element zero or one time, but as few times as possible.&lt;/td>
&lt;td>&lt;code>&amp;quot;rai??n&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ran&amp;quot;&lt;/code>, &lt;code>&amp;quot;rain&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>}?&lt;/code>&lt;/td>
&lt;td>Matches the preceding element exactly &lt;em>n&lt;/em> times.&lt;/td>
&lt;td>&lt;code>&amp;quot;,\d{3}?&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;,043&amp;quot;&lt;/code> in &lt;code>&amp;quot;1,043.6&amp;quot;&lt;/code>, &lt;code>&amp;quot;,876&amp;quot;&lt;/code>, &lt;code>&amp;quot;,543&amp;quot;&lt;/code>, and &lt;code>&amp;quot;,210&amp;quot;&lt;/code> in &lt;code>&amp;quot;9,876,543,210&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>,}?&lt;/code>&lt;/td>
&lt;td>Matches the previous element at least &lt;em>n&lt;/em> times, but as few times as possible.&lt;/td>
&lt;td>&lt;code>&amp;quot;\d{2,}?&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;166&amp;quot;&lt;/code>, &lt;code>&amp;quot;29&amp;quot;&lt;/code>, &lt;code>&amp;quot;1930&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>{&lt;/code> &lt;em>n&lt;/em> &lt;code>,&lt;/code> &lt;em>m&lt;/em> &lt;code>}?&lt;/code>&lt;/td>
&lt;td>Matches the previous element between &lt;em>n&lt;/em> and &lt;em>m&lt;/em> times, but as few times as possible.&lt;/td>
&lt;td>&lt;code>&amp;quot;\d{3,5}?&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;166&amp;quot;&lt;/code>, &lt;code>&amp;quot;17668&amp;quot;&lt;/code>&lt;br/>&lt;br/> &lt;code>&amp;quot;193&amp;quot;&lt;/code>, &lt;code>&amp;quot;024&amp;quot;&lt;/code> in &lt;code>&amp;quot;193024&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="backreference-constructs">Backreference Constructs&lt;/h2>
&lt;p>A backreference allows a previously matched subexpression to be identified subsequently in the same regular expression. The following table lists the backreference constructs supported by regular expressions in .NET. For more information, see &lt;a href="backreference-constructs-in-regular-expressions" data-linktype="relative-path">Backreference Constructs&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Backreference construct&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>\&lt;/code> &lt;em>number&lt;/em>&lt;/td>
&lt;td>Backreference. Matches the value of a numbered subexpression.&lt;/td>
&lt;td>&lt;code>(\w)\1&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ee&amp;quot;&lt;/code> in &lt;code>&amp;quot;seek&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>\k&amp;lt;&lt;/code> &lt;em>name&lt;/em> &lt;code>&amp;gt;&lt;/code>&lt;/td>
&lt;td>Named backreference. Matches the value of a named expression.&lt;/td>
&lt;td>&lt;code>(?&amp;lt;char&amp;gt;\w)\k&amp;lt;char&amp;gt;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;ee&amp;quot;&lt;/code> in &lt;code>&amp;quot;seek&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="alternation-constructs">Alternation Constructs&lt;/h2>
&lt;p>Alternation constructs modify a regular expression to enable either/or matching. These constructs include the language elements listed in the following table. For more information, see &lt;a href="alternation-constructs-in-regular-expressions" data-linktype="relative-path">Alternation Constructs&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Alternation construct&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>|&lt;/code>&lt;/td>
&lt;td>Matches any one element separated by the vertical bar (&lt;code>|&lt;/code>) character.&lt;/td>
&lt;td>&lt;code>th(e|is|at)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;the&amp;quot;&lt;/code>, &lt;code>&amp;quot;this&amp;quot;&lt;/code> in &lt;code>&amp;quot;this is the day.&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?(&lt;/code> &lt;em>expression&lt;/em> &lt;code>)&lt;/code> &lt;em>yes&lt;/em> &lt;code>|&lt;/code> &lt;em>no&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Matches &lt;em>yes&lt;/em> if the regular expression pattern designated by &lt;em>expression&lt;/em> matches; otherwise, matches the optional &lt;em>no&lt;/em> part. &lt;em>expression&lt;/em> is interpreted as a zero-width assertion.&lt;/td>
&lt;td>&lt;code>(?(A)A\d{2}\b|\b\d{3}\b)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;A10&amp;quot;&lt;/code>, &lt;code>&amp;quot;910&amp;quot;&lt;/code> in &lt;code>&amp;quot;A10 C103 910&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?(&lt;/code> &lt;em>name&lt;/em> &lt;code>)&lt;/code> &lt;em>yes&lt;/em> &lt;code>|&lt;/code> &lt;em>no&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Matches &lt;em>yes&lt;/em> if &lt;em>name&lt;/em>, a named or numbered capturing group, has a match; otherwise, matches the optional &lt;em>no&lt;/em>.&lt;/td>
&lt;td>&lt;code>(?&amp;lt;quoted&amp;gt;&amp;quot;)?(?(quoted).+?&amp;quot;|\S+\s)&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;Dogs.jpg &amp;quot;&lt;/code>, &lt;code>&amp;quot;\&amp;quot;Yiska playing.jpg\&amp;quot;&amp;quot;&lt;/code> in &lt;code>&amp;quot;Dogs.jpg \&amp;quot;Yiska playing.jpg\&amp;quot;&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="substitutions">Substitutions&lt;/h2>
&lt;p>Substitutions are regular expression language elements that are supported in replacement patterns. For more information, see &lt;a href="substitutions-in-regular-expressions" data-linktype="relative-path">Substitutions&lt;/a>. The metacharacters listed in the following table are atomic zero-width assertions.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Character&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Replacement pattern&lt;/th>
&lt;th>Input string&lt;/th>
&lt;th>Result string&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>$&lt;/code> &lt;em>number&lt;/em>&lt;/td>
&lt;td>Substitutes the substring matched by group &lt;em>number&lt;/em>.&lt;/td>
&lt;td>&lt;code>\b(\w+)(\s)(\w+)\b&lt;/code>&lt;/td>
&lt;td>&lt;code>$3$2$1&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;one two&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;two one&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>${&lt;/code> &lt;em>name&lt;/em> &lt;code>}&lt;/code>&lt;/td>
&lt;td>Substitutes the substring matched by the named group &lt;em>name&lt;/em>.&lt;/td>
&lt;td>&lt;code>\b(?&amp;lt;word1&amp;gt;\w+)(\s)(?&amp;lt;word2&amp;gt;\w+)\b&lt;/code>&lt;/td>
&lt;td>&lt;code>${word2} ${word1}&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;one two&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;two one&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$$&lt;/code>&lt;/td>
&lt;td>Substitutes a literal &amp;quot;$&amp;quot;.&lt;/td>
&lt;td>&lt;code>\b(\d+)\s?USD&lt;/code>&lt;/td>
&lt;td>&lt;code>$$$1&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;103 USD&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;$103&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$&amp;amp;&lt;/code>&lt;/td>
&lt;td>Substitutes a copy of the whole match.&lt;/td>
&lt;td>&lt;code>\$?\d*\.?\d+&lt;/code>&lt;/td>
&lt;td>&lt;code>**$&amp;amp;**&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;$1.30&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;**$1.30**&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$` &lt;/code>&lt;/td>
&lt;td>Substitutes all the text of the input string before the match.&lt;/td>
&lt;td>&lt;code>B+&lt;/code>&lt;/td>
&lt;td>&lt;code>$` &lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AABBCC&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AAAACC&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$'&lt;/code>&lt;/td>
&lt;td>Substitutes all the text of the input string after the match.&lt;/td>
&lt;td>&lt;code>B+&lt;/code>&lt;/td>
&lt;td>&lt;code>$'&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AABBCC&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AACCCC&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$+&lt;/code>&lt;/td>
&lt;td>Substitutes the last group that was captured.&lt;/td>
&lt;td>&lt;code>B+(C+)&lt;/code>&lt;/td>
&lt;td>&lt;code>$+&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AABBCCDD&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AACCDD&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>$_&lt;/code>&lt;/td>
&lt;td>Substitutes the entire input string.&lt;/td>
&lt;td>&lt;code>B+&lt;/code>&lt;/td>
&lt;td>&lt;code>$_&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AABBCC&amp;quot;&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;AAAABBCCCC&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="regular-expression-options">Regular Expression Options&lt;/h2>
&lt;p>You can specify options that control how the regular expression engine interprets a regular expression pattern. Many of these options can be specified either inline (in the regular expression pattern) or as one or more &lt;a href="https://siqi-zheng.rbind.io/en-us/dotnet/api/system.text.regularexpressions.regexoptions" data-linktype="absolute-path">RegexOptions&lt;/a> constants. This quick reference lists only inline options. For more information about inline and &lt;a href="https://siqi-zheng.rbind.io/en-us/dotnet/api/system.text.regularexpressions.regexoptions" data-linktype="absolute-path">RegexOptions&lt;/a> options, see the article &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/p>
&lt;p>You can specify an inline option in two ways:&lt;/p>
&lt;ul>
&lt;li>By using the &lt;a href="miscellaneous-constructs-in-regular-expressions" data-linktype="relative-path">miscellaneous construct&lt;/a> &lt;code>(?imnsx-imnsx)&lt;/code>, where a minus sign (-) before an option or set of options turns those options off. For example, &lt;code>(?i-mn)&lt;/code> turns case-insensitive matching (&lt;code>i&lt;/code>) on, turns multiline mode (&lt;code>m&lt;/code>) off, and turns unnamed group captures (&lt;code>n&lt;/code>) off. The option applies to the regular expression pattern from the point at which the option is defined, and is effective either to the end of the pattern or to the point where another construct reverses the option.&lt;/li>
&lt;li>By using the &lt;a href="grouping-constructs-in-regular-expressions" data-linktype="relative-path">grouping construct&lt;/a>&lt;code>(?imnsx-imnsx:&lt;/code>&lt;em>subexpression&lt;/em>&lt;code>)&lt;/code>, which defines options for the specified group only.&lt;/li>
&lt;/ul>
&lt;p>The .NET regular expression engine supports the following inline options:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Option&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Pattern&lt;/th>
&lt;th>Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>i&lt;/code>&lt;/td>
&lt;td>Use case-insensitive matching.&lt;/td>
&lt;td>&lt;code>\b(?i)a(?-i)a\w+\b&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;aardvark&amp;quot;&lt;/code>, &lt;code>&amp;quot;aaaAuto&amp;quot;&lt;/code> in &lt;code>&amp;quot;aardvark AAAuto aaaAuto Adam breakfast&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>m&lt;/code>&lt;/td>
&lt;td>Use multiline mode. &lt;code>^&lt;/code> and &lt;code>$&lt;/code> match the beginning and end of a line, instead of the beginning and end of a string.&lt;/td>
&lt;td>For an example, see the &amp;quot;Multiline Mode&amp;quot; section in &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>n&lt;/code>&lt;/td>
&lt;td>Do not capture unnamed groups.&lt;/td>
&lt;td>For an example, see the &amp;quot;Explicit Captures Only&amp;quot; section in &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>s&lt;/code>&lt;/td>
&lt;td>Use single-line mode.&lt;/td>
&lt;td>For an example, see the &amp;quot;Single-line Mode&amp;quot; section in &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>x&lt;/code>&lt;/td>
&lt;td>Ignore unescaped white space in the regular expression pattern.&lt;/td>
&lt;td>&lt;code>\b(?x) \d+ \s \w+&lt;/code>&lt;/td>
&lt;td>&lt;code>&amp;quot;1 aardvark&amp;quot;&lt;/code>, &lt;code>&amp;quot;2 cats&amp;quot;&lt;/code> in &lt;code>&amp;quot;1 aardvark 2 cats IV centurions&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="miscellaneous-constructs">Miscellaneous Constructs&lt;/h2>
&lt;p>Miscellaneous constructs either modify a regular expression pattern or provide information about it. The following table lists the miscellaneous constructs supported by .NET. For more information, see &lt;a href="miscellaneous-constructs-in-regular-expressions" data-linktype="relative-path">Miscellaneous Constructs&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Construct&lt;/th>
&lt;th>Definition&lt;/th>
&lt;th>Example&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>(?imnsx-imnsx)&lt;/code>&lt;/td>
&lt;td>Sets or disables options such as case insensitivity in the middle of a pattern.For more information, see &lt;a href="regular-expression-options" data-linktype="relative-path">Regular Expression Options&lt;/a>.&lt;/td>
&lt;td>&lt;code>\bA(?i)b\w+\b&lt;/code> matches &lt;code>&amp;quot;ABA&amp;quot;&lt;/code>, &lt;code>&amp;quot;Able&amp;quot;&lt;/code> in &lt;code>&amp;quot;ABA Able Act&amp;quot;&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>(?#&lt;/code> &lt;em>comment&lt;/em> &lt;code>)&lt;/code>&lt;/td>
&lt;td>Inline comment. The comment ends at the first closing parenthesis.&lt;/td>
&lt;td>&lt;code>\bA(?#Matches words starting with A)\w+\b&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>#&lt;/code> [to end of line]&lt;/td>
&lt;td>X-mode comment. The comment starts at an unescaped &lt;code>#&lt;/code> and continues to the end of the line.&lt;/td>
&lt;td>&lt;code>(?x)\bA\w+\b#Matches words starting with A&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="see-also">See also&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-A63F-AEB6DA0B921C/Regular%20expressions%20quick%20reference.docx" data-linktype="external">Regular Expressions - Quick Reference (download in Word format)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-A63F-AEB6DA0B921C/Regular%20expressions%20quick%20reference.pdf" data-linktype="external">Regular Expressions - Quick Reference (download in PDF format)&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Learning SQL Notes #4: Query Primer (CH. 7)</title><link>https://siqi-zheng.rbind.io/post/2021-05-27-sql-notes-4/</link><pubDate>Thu, 27 May 2021 20:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-05-27-sql-notes-4/</guid><description>&lt;h1 id="working-with-sets">Working with Sets&lt;/h1>
&lt;ul>
&lt;li>
&lt;a href="#working-with-sets">Working with Sets&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#set-theory-in-practice">Set Theory in Practice&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#set-operators">Set Operators&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#the-union-operator">The UNION Operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#the-intersect-operator-not-for-mysql">The INTERSECT Operator (Not for MySQL!)&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#the-except-operator-not-for-mysql">The EXCEPT Operator (Not for MySQL!)&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#set-operation-rules">Set Operation Rules&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#sorting-compound-query-results">Sorting Compound Query Results&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#sort">Sort&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#order">Order&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="set-theory-in-practice">Set Theory in Practice&lt;/h2>
&lt;ul>
&lt;li>Both data sets must have the &lt;strong>same number of columns&lt;/strong>.&lt;/li>
&lt;li>The &lt;strong>data&lt;/strong> &lt;strong>types&lt;/strong> of each column across the two data sets must be the &lt;strong>same&lt;/strong> (or the server must be able to convert one to the other).&lt;/li>
&lt;/ul>
&lt;h2 id="set-operators">Set Operators&lt;/h2>
&lt;h3 id="the-union-operator">The UNION Operator&lt;/h3>
&lt;p>The &lt;code>union&lt;/code> and &lt;code>union all&lt;/code> operators allow you to combine multiple data sets. The difference between the two is that &lt;code>union&lt;/code> sorts the combined set and &lt;em>removes duplicates&lt;/em>, whereas &lt;code>union all&lt;/code> does not.&lt;/p>
&lt;p>&lt;img src="union_all.png" alt="">
&lt;a href="https://www.sqlshack.com/sql-union-vs-union-all-in-sql-server/">https://www.sqlshack.com/sql-union-vs-union-all-in-sql-server/&lt;/a>&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name
FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%'
UNION ALL
SELECT a.first_name, a.last_name
FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%';
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">first_name&lt;/th>
&lt;th align="right">last_name&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">JENNIFER&lt;/td>
&lt;td align="right">DAVIS&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JENNIFER&lt;/td>
&lt;td align="right">DAVIS&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JUDY&lt;/td>
&lt;td align="right">DEAN&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JODIE&lt;/td>
&lt;td align="right">DEGENERES&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JULIANNE&lt;/td>
&lt;td align="right">DENCH&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(dplyr)
union_all(df1,df2)
&lt;/code>&lt;/pre>
&lt;p>where as &lt;code>UNION&lt;/code> removes duplicate Jennifer Davis.&lt;/p>
&lt;p>&lt;img src="uinon.png" alt="">
&lt;a href="https://www.sqlshack.com/sql-union-vs-union-all-in-sql-server/">https://www.sqlshack.com/sql-union-vs-union-all-in-sql-server/&lt;/a>&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name
FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%'
UNION
SELECT a.first_name, a.last_name
FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%';
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">first_name&lt;/th>
&lt;th align="right">last_name&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">JENNIFER&lt;/td>
&lt;td align="right">DAVIS&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JUDY&lt;/td>
&lt;td align="right">DEAN&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JODIE&lt;/td>
&lt;td align="right">DEGENERES&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JULIANNE&lt;/td>
&lt;td align="right">DENCH&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(dplyr)
union(df1,df2)
&lt;/code>&lt;/pre>
&lt;h3 id="the-intersect-operator-not-for-mysql">The INTERSECT Operator (Not for MySQL!)&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name
FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%'
INTERSECT
SELECT a.first_name, a.last_name
FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%';
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">first_name&lt;/th>
&lt;th align="right">last_name&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">JENNIFER&lt;/td>
&lt;td align="right">DAVIS&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(dplyr)
intersect(df1,df2)
&lt;/code>&lt;/pre>
&lt;h3 id="the-except-operator-not-for-mysql">The EXCEPT Operator (Not for MySQL!)&lt;/h3>
&lt;pre>&lt;code class="language-sql">SELECT c.first_name, c.last_name
FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%'
EXCEPT
SELECT a.first_name, a.last_name
FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%';
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">first_name&lt;/th>
&lt;th align="right">last_name&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">JUDY&lt;/td>
&lt;td align="right">DEAN&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JODIE&lt;/td>
&lt;td align="right">DEGENERES&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">JULIANNE&lt;/td>
&lt;td align="right">DENCH&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">library(dplyr)
setdiff(df1,df2)
&lt;/code>&lt;/pre>
&lt;p>*Set A *&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">actor_id&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">10&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">11&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">12&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">10&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">10&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;em>Set B&lt;/em>
| actor_id |
| :&amp;mdash;&amp;mdash;: |
| 10 |
| 10 |&lt;/p>
&lt;p>The operation&lt;code> A except B&lt;/code> yields the following:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">actor_id&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">11&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">12&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The operation&lt;code> A except all B&lt;/code> yields the following:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="center">actor_id&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="center">10&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">11&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="center">12&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The difference between the two operations is that except removes all occurrences of duplicate data from set A, whereas except all removes only one occurrence of duplicate data from set A &lt;em>for every occurrence&lt;/em> in set B.&lt;/p>
&lt;h2 id="set-operation-rules">Set Operation Rules&lt;/h2>
&lt;p>The following sections outline some rules that you must follow when working with compound queries.&lt;/p>
&lt;h3 id="sorting-compound-query-results">Sorting Compound Query Results&lt;/h3>
&lt;h4 id="sort">Sort&lt;/h4>
&lt;pre>&lt;code class="language-sql">SELECT a.first_name fname, a.last_name lname /*aliases can be helpful*/
FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%' UNION ALL
SELECT c.first_name, c.last_name
FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%' ORDER BY lname, fname;
&lt;/code>&lt;/pre>
&lt;h4 id="order">Order&lt;/h4>
&lt;p>In general, compound queries containing three or more queries are evaluated in order from top to bottom. Except for:&lt;/p>
&lt;ul>
&lt;li>The ANSI SQL specification calls for the intersect operator to have precedence over the other set operators.&lt;/li>
&lt;li>You may dictate the order in which queries are combined by enclosing multiple queries in parentheses.&lt;/li>
&lt;/ul>
&lt;p>NOT FOR MySQL:&lt;/p>
&lt;p>You can also wrap adjoining queries in parentheses to override the default top-to-bottom processing of compound queries.&lt;/p>
&lt;pre>&lt;code class="language-sql">SELECT a.first_name, a.last_name FROM actor a
WHERE a.first_name LIKE 'J%' AND a.last_name LIKE 'D%' UNION (SELECT a.first_name, a.last_name FROM actor a
WHERE a.first_name LIKE 'M%' AND a.last_name LIKE 'T%' UNION ALL
SELECT c.first_name, c.last_name FROM customer c
WHERE c.first_name LIKE 'J%' AND c.last_name LIKE 'D%'
)
&lt;/code>&lt;/pre></description></item><item><title>Learning SQL Notes #3: Query Primer (CH. 3)</title><link>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-3/</link><pubDate>Wed, 26 May 2021 20:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-3/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#query-mechanics">Query Mechanics&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#query-clauses">Query Clauses&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#select">SELECT&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#from">FROM&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#table-links">Table Links&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#table-aliases">Table Aliases&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#group-by-and-having-ch-8">GROUP BY and HAVING (CH. 8)&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#order-by">ORDER BY&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#filtering">Filtering&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#where">WHERE&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#or-operator">OR operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#and-operator">AND operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#not-operator">NOT operator&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#expressions">Expressions&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#null">NULL&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Complete sometime this summer:&lt;/p>
&lt;ul>
&lt;li>&lt;input disabled="" type="checkbox"> Finish Join Notes;&lt;/li>
&lt;li>&lt;input disabled="" type="checkbox"> Finish GROUP BY Notes;&lt;/li>
&lt;/ul>
&lt;h2 id="query-mechanics">Query Mechanics&lt;/h2>
&lt;ul>
&lt;li>Do you have permission to execute the statement?&lt;/li>
&lt;li>Do you have permission to access the desired data?&lt;/li>
&lt;li>Is your statement syntax correct?&lt;/li>
&lt;/ul>
&lt;h2 id="query-clauses">Query Clauses&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Clause name&lt;/th>
&lt;th align="right">Purpose&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">select&lt;/td>
&lt;td align="right">Determines which columns to include in the query’s result set&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">from&lt;/td>
&lt;td align="right">Identifies the tables from which to retrieve data and how the tables should be joined&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">where&lt;/td>
&lt;td align="right">Filters out unwanted data&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">group by&lt;/td>
&lt;td align="right">Used to group rows together by common column values&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">having&lt;/td>
&lt;td align="right">Filters out unwanted groups&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">order by&lt;/td>
&lt;td align="right">the rows of the final result set by one or more columns&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="select">SELECT&lt;/h3>
&lt;ul>
&lt;li>Literals, such as numbers or strings&lt;/li>
&lt;li>Expressions, such as transaction.amount * −1&lt;/li>
&lt;li>Built-in function calls, such as ROUND(transaction.amount, 2)&lt;/li>
&lt;li>User-defined function calls&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-SQL">SELECT version(), user(), database();
&lt;/code>&lt;/pre>
&lt;p>Results:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">version()&lt;/th>
&lt;th align="center">user()&lt;/th>
&lt;th align="right">database()&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">8.0.15&lt;/td>
&lt;td align="center">root@localhost&lt;/td>
&lt;td align="right">sakila&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-SQL">SELECT row1 AS r1;/*Column Aliases*/
SELECT DISTINCT row1 /*Removing Duplicates-should know beforehand whether duplicates are possible*/
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes：&lt;/p>
&lt;pre>&lt;code class="language-r">unique()
&lt;/code>&lt;/pre>
&lt;h3 id="from">FROM&lt;/h3>
&lt;ul>
&lt;li>Permanent tables (i.e., created using the create table statement)&lt;/li>
&lt;li>Derived tables (i.e., rows returned by a subquery and held in memory)
&lt;pre>&lt;code class="language-sql">SELECT *
FROM
(SELECT first_name, last_name, email
FROM customer
WHERE first_name = 'JESSIE'
) AS cust;
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>Temporary tables (i.e., volatile data held in memory): any data inserted into a temporary table will disappear at some point
&lt;pre>&lt;code class="language-sql">CREATE TEMPORARY TABLE actors_j
(actor_id smallint(5),
first_name varchar(45),
last_name varchar(45)
);
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>Virtual tables (i.e., created using the create view statement): When you issue a query against a view, your query is &lt;strong>merged&lt;/strong> with the view definition to create a final query to be executed.
&lt;pre>&lt;code class="language-SQL">CREATE VIEW cust_vw AS
SELECT customer_id, first_name, last_name, active
FROM customer;
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h4 id="table-links">Table Links&lt;/h4>
&lt;p>See JOIN in the next note.&lt;/p>
&lt;h4 id="table-aliases">Table Aliases&lt;/h4>
&lt;pre>&lt;code class="language-SQL">FROM customer AS c;
&lt;/code>&lt;/pre>
&lt;h3 id="group-by-and-having-ch-8">GROUP BY and HAVING (CH. 8)&lt;/h3>
&lt;p>[] Haven&amp;rsquo;t done&lt;/p>
&lt;h3 id="order-by">ORDER BY&lt;/h3>
&lt;ol>
&lt;li>
&lt;pre>&lt;code class="language-sql">ORDER BY col1, col2, etc;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes：&lt;/p>
&lt;pre>&lt;code class="language-r">df[order(col1),]
require(tidyverse)
df %&amp;gt;%
arrange(col1)
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>
&lt;pre>&lt;code class="language-sql">ORDER BY col1;
ORDER BY col1 desc;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes：&lt;/p>
&lt;pre>&lt;code class="language-r">df[order(-col1),]
require(tidyverse)
df %&amp;gt;%
arrange(desc(col1))
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>
&lt;pre>&lt;code class="language-sql">SELECT col1, col2, col3;
FROM table1
ORDER BY 3; /*equivalent to ORDER BY col3*/
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ol>
&lt;h2 id="filtering">Filtering&lt;/h2>
&lt;h3 id="where">WHERE&lt;/h3>
&lt;pre>&lt;code class="language-SQL">(...) AND (...)
(...) OR (...)
&lt;/code>&lt;/pre>
&lt;p>See &lt;strong>operators&lt;/strong> and &lt;strong>expressions&lt;/strong> for details.&lt;/p>
&lt;h4 id="or-operator">OR operator&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Intermediate result&lt;/th>
&lt;th align="right">Final result&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">WHERE true OR true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE true OR false&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE false OR true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE false OR false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="and-operator">AND operator&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Intermediate result&lt;/th>
&lt;th align="right">Final result&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">WHERE (true OR true) AND true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (true OR false) AND true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (false OR true) AND true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (false OR false) AND true&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (true OR true) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (true OR false) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (false OR true) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE (false OR false) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="not-operator">NOT operator&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Intermediate result&lt;/th>
&lt;th align="right">Final result&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">WHERE NOT (true OR true) AND true&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (true OR false) AND true&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (false OR true) AND true&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (false OR false) AND true&lt;/td>
&lt;td align="right">true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (true OR true) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (true OR false) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (false OR true) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">WHERE NOT (false OR false) AND false&lt;/td>
&lt;td align="right">false&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="expressions">Expressions&lt;/h4>
&lt;p>An expression can be any of the following:&lt;/p>
&lt;ul>
&lt;li>A number&lt;/li>
&lt;li>A column in a table or view&lt;/li>
&lt;li>A string literal, such as &amp;lsquo;Maple Street&amp;rsquo;&lt;/li>
&lt;li>A built-in function, such as concat(&amp;lsquo;Learning&amp;rsquo;, ' &amp;lsquo;, &amp;lsquo;SQL&amp;rsquo;)&lt;/li>
&lt;li>A subquery&lt;/li>
&lt;li>A list of expressions, such as (&amp;lsquo;Boston&amp;rsquo;, &amp;lsquo;New York&amp;rsquo;, &amp;lsquo;Chicago&amp;rsquo;)&lt;/li>
&lt;/ul>
&lt;p>Operators:&lt;/p>
&lt;ul>
&lt;li>Comparison operators, such as =, !=, &amp;lt;, &amp;lt;=, &amp;gt;, &amp;gt;=, &amp;lt;&amp;gt;, like, in, between, is null, exists&lt;/li>
&lt;li>Arithmetic operators, such as +, −, *, /, DIV (integer division) and (% or MOD) for modulus&lt;/li>
&lt;/ul>
&lt;p>Note:&lt;/p>
&lt;ol>
&lt;li>= can be used for date/string/number;&lt;/li>
&lt;li>&amp;lsquo;between and&amp;rsquo; can be used for date/string/number;&lt;/li>
&lt;li>&amp;lsquo;between and&amp;rsquo; is inclusive;&lt;/li>
&lt;li>col1 (not) in (&amp;lsquo;A&amp;rsquo;,&amp;lsquo;B&amp;rsquo;)/subqueries;&lt;/li>
&lt;li>built-in function: left(name, 1) in (&amp;lsquo;A&amp;rsquo;,&amp;lsquo;B&amp;rsquo;);&lt;/li>
&lt;li>wildcards/regular expressions:
&lt;ul>
&lt;li>Strings beginning/ending with a certain &lt;strong>character&lt;/strong>&lt;/li>
&lt;li>Strings beginning/ending with a &lt;strong>substring&lt;/strong>&lt;/li>
&lt;li>Strings containing a certain &lt;strong>character&lt;/strong> &lt;strong>anywhere&lt;/strong> within the string&lt;/li>
&lt;li>Strings containing a &lt;strong>substring anywhere&lt;/strong> within the string&lt;/li>
&lt;li>Strings with a &lt;strong>specific format&lt;/strong>, regardless of individual characters&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th align="left">Wildcard character&lt;/th>
&lt;th align="right">Matches&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td align="left">_&lt;/td>
&lt;td align="right">Exactly one character&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td align="left">%&lt;/td>
&lt;td align="right">Any number of characters (including 0)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="null">NULL&lt;/h4>
&lt;p>Null is used for various cases where a value cannot be supplied, such as:&lt;/p>
&lt;ul>
&lt;li>Not applicable
Such as the employee ID column for a transaction that took place at an ATM machine&lt;/li>
&lt;li>Value not yet known
Such as when the federal ID is not known at the time a customer row is created&lt;/li>
&lt;li>Value undefined
Such as when an account is created for a product that has not yet been added to the database&lt;/li>
&lt;/ul>
&lt;p>Note:&lt;/p>
&lt;ul>
&lt;li>An expression can be null, but it can &lt;strong>never equal&lt;/strong> null. IS NULL/IS NOT NULL.&lt;/li>
&lt;li>Two nulls are &lt;strong>never equal to each other&lt;/strong>.&lt;/li>
&lt;/ul></description></item><item><title>Learning SQL Notes #2: Data Types</title><link>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-2/</link><pubDate>Wed, 26 May 2021 01:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-2/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#character-data">Character Data&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#numeric-data">Numeric Data&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#temporal-data">Temporal Data&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#bouns-find-current-time">BOUNS: Find Current Time&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="character-data">Character Data&lt;/h3>
&lt;pre>&lt;code class="language-SQL">char(20) /* fixed-length */
varchar(20) /* variable-length */
&lt;/code>&lt;/pre>
&lt;p>No easy way to constrain the length of character in &lt;strong>R&lt;/strong>, but one can try &lt;code>stringr::str_trunc()&lt;/code>.&lt;/p>
&lt;p>Note:&lt;/p>
&lt;ol>
&lt;li>If the data being loaded into a text column exceeds the maximum size for that type, the data will be truncated;&lt;/li>
&lt;li>Trailing spaces &lt;strong>will not&lt;/strong> be removed when data is loaded into the column;&lt;/li>
&lt;li>When using text columns for sorting or grouping, only the first 1,024 bytes are used, although this limit may be increased if necessary.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-SQL">CREATE DATABASE european_sales CHARACTER SET latin1;
&lt;/code>&lt;/pre>
&lt;h3 id="numeric-data">Numeric Data&lt;/h3>
&lt;ol>
&lt;li>Boolean: 0 False, 1 True.&lt;/li>
&lt;li>System-generated primary keys: 1 to $\infin$, integers;
&lt;pre>&lt;code class="language-SQL">mediumint −8,388,608 to 8,388,607
mediumint unsigned 0 to 16,777,215
int −2,147,483,648 to 2,147,483,647
int unsigned 0 to 4,294,967,295
bigint −2^63 to 2^63 - 1
bigint unsigned 0 to 2^64 - 1
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>Item number: positive integers in a range;
&lt;pre>&lt;code class="language-SQL">tinyint −128 to 127
tinyint unsigned 0 to 255
smallint −32,768 to 32,767
smallint unsigned 0 to 65,535
&lt;/code>&lt;/pre>
&lt;p>unsigned takes only positive values；&lt;/p>
&lt;/li>
&lt;li>High-precision scientific or manufacturing data;
&lt;pre>&lt;code class="language-SQL">float( p , s ) −3.402823466E+38 to −1.175494351E-38 and 1.175494351E-38 to 3.402823466E+38
double( p , s ) −1.7976931348623157E+308 to −2.2250738585072014E-308
and 2.2250738585072014E-308 to 1.7976931348623157E+308
&lt;/code>&lt;/pre>
&lt;p>p, s are optional parameters, precision (the total number of allowable digits both to the left and to the right of the decimal point) and a scale (the number of allowable digits to the right of the decimal point), left digits = p - s.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="temporal-data">Temporal Data&lt;/h3>
&lt;ul>
&lt;li>The &lt;strong>future date&lt;/strong> that a particular event is expected to happen, such as shipping a customer’s order
&lt;pre>&lt;code class="language-SQL">date YYYY-MM-DD 1000-01-01 to 9999-12-31
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>The date that a customer’s order &lt;strong>was shipped&lt;/strong>
&lt;pre>&lt;code class="language-SQL">datetime YYYY-MM-DD HH:MI:SS 1000-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>The &lt;strong>date and time&lt;/strong> that a user &lt;strong>modified&lt;/strong> a particular row in a table
&lt;pre>&lt;code class="language-SQL">timestamp YYYY-MM-DD HH:MI:SS 1970-01-01 00:00:00.000000 to 2038-01-18 22:14:07.999999
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>An employee’s &lt;strong>birth date&lt;/strong>
&lt;pre>&lt;code class="language-SQL">date YYYY-MM-DD 1000-01-01 to 9999-12-31
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>The &lt;strong>year&lt;/strong> corresponding to a row in a yearly_sales fact table in a data warehouse
&lt;pre>&lt;code class="language-SQL">year YYYY 1901-2155
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>The &lt;strong>elapsed time&lt;/strong> needed to complete a wiring harness on an automobile assembly line
&lt;pre>&lt;code class="language-SQL">time HHH:MI:SS −838:59:59.000000 to 838:59:59.000000
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="bouns-find-current-time">BOUNS: Find Current Time&lt;/h3>
&lt;p>To find the current data/time:&lt;/p>
&lt;pre>&lt;code class="language-SQL">SELECT now();
/*2019-04-04 20:44:26 Timezone not included*/
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes：&lt;/p>
&lt;pre>&lt;code class="language-r">sys.time()
# &amp;quot;2021-05-25 10:58:06 EDT&amp;quot;, Timezone included
&lt;/code>&lt;/pre>
&lt;p>If Oracle, add &lt;code>FROM dual;&lt;/code>;(Think about &lt;em>dummy variable&lt;/em>!)&lt;/p></description></item><item><title>Learning SQL Notes #1</title><link>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-1/</link><pubDate>Tue, 25 May 2021 18:00:00 +0000</pubDate><guid>https://siqi-zheng.rbind.io/post/2021-05-26-sql-notes-1/</guid><description>&lt;ul>
&lt;li>
&lt;a href="#introduction-to-databases">Introduction to Databases&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#more-about-relational-databases">More about Relational Databases&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#find-databases">Find Databases&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#find-a-table">Find a Table&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#create-a-table">Create a Table&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#add-a-row">Add a Row&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#change-a-cell">Change a Cell&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#delete-a-row">Delete a Row&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#table-overview">Table Overview&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#show-tables">Show Tables&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#drop-a-table">Drop a Table&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#export-to-xml">Export to XML&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;a href="#table-creation-ch-2">Table Creation (CH. 2)&lt;/a>
&lt;ul>
&lt;li>
&lt;a href="#1---design">1 Design&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#2---refinement">2 Refinement&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#3---building-sql-schema-statements">3 Building SQL Schema Statements&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="introduction-to-databases">Introduction to Databases&lt;/h2>
&lt;ul>
&lt;li>SQL was initially created to be the language for generating, manipulating, and retrieving data from relational databases.&lt;/li>
&lt;li>A database is a set of related information.&lt;/li>
&lt;li>&lt;em>Database systems&lt;/em> are computerized data storage and retrieval mechanisms.&lt;/li>
&lt;li>&lt;em>Nonrelational Database Systems&lt;/em>:
&lt;ul>
&lt;li>In a &lt;em>hierarchical&lt;/em> database system, for example, data is represented as one or more tree structures. The hierarchical database system provides tools for locating a particular customer’s tree and then traversing the tree to find the desired accounts and/or transactions. Each node in the tree may have either zero or one parent and zero, one, or many children.&lt;/li>
&lt;li>&lt;em>Network database system&lt;/em> exposes sets of records and sets of links that define relationships between different records.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Data can be represented as sets of &lt;em>tables&lt;/em>. Rather than using pointers to navigate between related entities, redundant data is used to link records in different tables: &lt;em>relational model&lt;/em>.&lt;/li>
&lt;/ul>
&lt;h3 id="more-about-relational-databases">More about Relational Databases&lt;/h3>
&lt;ol>
&lt;li>Now columns/rows are constrained due to &lt;em>physical limit&lt;/em> or &lt;em>maintainability&lt;/em>;&lt;/li>
&lt;li>&lt;em>Primary key&lt;/em> includes information that &lt;strong>uniquely identifies&lt;/strong> a row in that table;
&lt;ol>
&lt;li>If more than one column, then &lt;em>compound key&lt;/em>;&lt;/li>
&lt;li>If select, say, first name, then it is a &lt;em>natural key&lt;/em>;&lt;/li>
&lt;li>If select an id, then it is a &lt;em>surrogate key&lt;/em>;&lt;/li>
&lt;li>&lt;strong>NEVER be allowed to change!&lt;/strong>&lt;/li>
&lt;li>Possible error:
&lt;pre>&lt;code>ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>More than one identifiers in a table including the &lt;em>primary key&lt;/em>: &lt;em>foreign keys&lt;/em>, connect the entities in different tables;&lt;/li>
&lt;li>Make sure that there is only &lt;strong>one place&lt;/strong> in the database that holds, say, the customer’s name; otherwise, the data might be changed in one place but not another, causing the data in the database to be unreliable. The process of refining a database design to ensure that each independent piece of information is in only &lt;strong>one place&lt;/strong> (except for foreign keys) is known as &lt;em>normalization&lt;/em>. (Think about the concept of &lt;em>Tidy Data&lt;/em> in &lt;strong>R&lt;/strong>!)&lt;/li>
&lt;li>Two-column primary key is also possible depending on the context (CH.2);&lt;/li>
&lt;li>Foreign key constraint limits the id to those exist in another table (CH.2); Possible error:
&lt;pre>&lt;code>ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails ('sakila'.'favorite_food', CONSTRAINT 'fk_fav_food_person_id' FOREIGN KEY
('person_id') REFERENCES 'person' ('person_id'))
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>Ways to generate primary keys:
&lt;ul>
&lt;li>Look at the largest value currently in the table and add one.&lt;/li>
&lt;li>Let the database server provide the value for you.&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-SQL">ALTER TABLE table_name MODIFY col_0 SMALLINT UNSIGNED AUTO_INCREMENT;
set foreign_key_checks=0; /*IMPORTANT*/
ALTER TABLE person
MODIFY person_id SMALLINT UNSIGNED AUTO_INCREMENT;
set foreign_key_checks=1; /*IMPORTANT*/
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ol>
&lt;h3 id="find-databases">Find Databases&lt;/h3>
&lt;p>To see the see the &lt;code>mysql&amp;gt;&lt;/code> prompt:&lt;/p>
&lt;pre>&lt;code>mysql -u root -p;
&lt;/code>&lt;/pre>
&lt;p>Then type &lt;code>show databases;&lt;/code> to display all databases;&lt;/p>
&lt;h3 id="find-a-table">Find a Table&lt;/h3>
&lt;p>To select a table, type &lt;code>use table_name;&lt;/code>;&lt;/p>
&lt;p>Can do the following:&lt;/p>
&lt;pre>&lt;code>mysql -u root -p table_name;
&lt;/code>&lt;/pre>
&lt;p>In&lt;strong>R&lt;/strong>, one can find it under the global environment.&lt;/p>
&lt;h3 id="create-a-table">Create a Table&lt;/h3>
&lt;pre>&lt;code class="language-SQL">CREATE TABLE table_name /*Create a table with name: ……*/
(col_0 smallint;
col_1 VARCHAR(30);
col_2 timestamp;
CONSTRAINT pk_col_0 PRIMARY KEY (col_0) /*set col_0 as primary key*/
); /*The most basic method to create a database*/
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-R">df &amp;lt;- data.frame()
# x1 = c(7, 3, 2, 9, 0),
# x2 = c(4, 4, 1, 1, 8),
# x0 = c(5, 3, 9, 2, 4)
# Primary key can only be added manually
&lt;/code>&lt;/pre>
&lt;h3 id="add-a-row">Add a Row&lt;/h3>
&lt;pre>&lt;code class="language-SQL">INSERT INTO table_name (col_0, col_1, col_2) /*The table*/
VALUES (27, 'Rdm Name', 'Acme Paper Corporation'); /*The values*/
/*The most basic method to insert a full row into a database*/
&lt;/code>&lt;/pre>
&lt;p>&lt;code>Query OK, 1 row affected&lt;/code>$\Rightarrow$one row was added to the database&lt;/p>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">new_row &amp;lt;- c(27, 'Rdm Name', 'Acme Paper Corporation')
rbind(df, new_row)
&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>You are not required to provide data for every column in the table unless the column cannot be NULL;&lt;/li>
&lt;li>MySQL will convert the &lt;strong>string&lt;/strong> to a &lt;strong>date&lt;/strong> for you as long as the &lt;strong>format is followed&lt;/strong>;
&lt;pre>&lt;code>ERROR 1292 (22007): Incorrect date value: 'DEC-21-1980' for column 'birth_date' at row 1
&lt;/code>&lt;/pre>
&lt;/li>
&lt;/ul>
&lt;h3 id="change-a-cell">Change a Cell&lt;/h3>
&lt;pre>&lt;code class="language-SQL">UPDATE table_name
/*Fix column*/ /*Insert the values*/
SET name = 'Certificate of Deposit'
WHERE col_2 = 'CD'; /*Fix row, otherwise all will be replaced*/
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">df[df$col_2=='CD', &amp;quot;name&amp;quot;] &amp;lt;- 'Certificate of Deposit'
# Fix column, fix row
&lt;/code>&lt;/pre>
&lt;h3 id="delete-a-row">Delete a Row&lt;/h3>
&lt;pre>&lt;code class="language-SQL">DELETE ...
/*Fix column*/
FROM table_name
WHERE col_2 = 'CD'; /*Fix row, otherwise all will be deleted*/
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">df[df$col_2=='CD', ] &amp;lt;- NULL
&lt;/code>&lt;/pre>
&lt;h3 id="table-overview">Table Overview&lt;/h3>
&lt;pre>&lt;code class="language-SQL">DESC favorite_food;
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">str(df)
summary(df)
glimpse(df)
&lt;/code>&lt;/pre>
&lt;p>Describe the table.&lt;/p>
&lt;h3 id="show-tables">Show Tables&lt;/h3>
&lt;pre>&lt;code class="language-SQL">show tables
&lt;/code>&lt;/pre>
&lt;h3 id="drop-a-table">Drop a Table&lt;/h3>
&lt;pre>&lt;code class="language-SQL">drop table xxx
&lt;/code>&lt;/pre>
&lt;h3 id="export-to-xml">Export to XML&lt;/h3>
&lt;p>Type the following in CMD:&lt;/p>
&lt;pre>&lt;code>mysql -u lrngsql -p --xml bank
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>OR&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-SQL">SELECT * FROM table_name
FOR XML AUTO, ELEMENTS /*IMPORTANT*/
&lt;/code>&lt;/pre>
&lt;p>No easy way to do so in &lt;strong>R&lt;/strong>.&lt;/p>
&lt;h2 id="table-creation-ch-2">Table Creation (CH. 2)&lt;/h2>
&lt;h3 id="1---design">1 Design&lt;/h3>
&lt;p>What info is needed? Make a list.&lt;/p>
&lt;h3 id="2---refinement">2 Refinement&lt;/h3>
&lt;ol>
&lt;li>Compound objects need to be separated into multiple columns, including names or address;&lt;/li>
&lt;li>If a column is a list containing zero, one, or more independent items, we need another table;&lt;/li>
&lt;li>Need primary key column(s) to guarantee uniqueness.&lt;/li>
&lt;/ol>
&lt;h3 id="3---building-sql-schema-statements">3 Building SQL Schema Statements&lt;/h3>
&lt;p>Another type of constraint called a &lt;strong>check constraint&lt;/strong> constrains the allowable values for a particular column. A check constraint to be attached to a &lt;strong>column definition&lt;/strong>.&lt;/p>
&lt;pre>&lt;code class="language-SQL">eye_color CHAR(2) CHECK (eye_color IN ('BR','BL','GR'))
&lt;/code>&lt;/pre>
&lt;p>Possible error:&lt;/p>
&lt;pre>&lt;code>ERROR 1265 (01000): Data truncated for column 'eye_color' at row 1
&lt;/code>&lt;/pre>
&lt;p>MySQL does provide another character data type called &lt;code>enum&lt;/code> that merges the check constraint into the data type definition.&lt;/p>
&lt;pre>&lt;code class="language-SQL">eye_color ENUM('BR','BL','GR')
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>R&lt;/strong> codes:&lt;/p>
&lt;pre>&lt;code class="language-r">Enum &amp;lt;- function(...) {
## EDIT: use solution provided in comments to capture the arguments
values &amp;lt;- sapply(match.call(expand.dots = TRUE)[-1L], deparse)
stopifnot(identical(unique(values), values))
res &amp;lt;- setNames(seq_along(values), values)
res &amp;lt;- as.environment(as.list(res))
lockEnvironment(res, bindings = TRUE)
res
}
FRUITS &amp;lt;- Enum(APPLE, BANANA, MELON)
&lt;/code>&lt;/pre>
&lt;p>See &lt;a href="https://stackoverflow.com/questions/33838392/enum-like-arguments-in-r">https://stackoverflow.com/questions/33838392/enum-like-arguments-in-r&lt;/a> for further details.&lt;/p>
&lt;p>After processing the create table statement, the MySQL server returns the message &amp;ldquo;Query OK, 0 rows affected,&amp;rdquo; which tells me that the statement had no &lt;strong>syntax errors&lt;/strong>.&lt;/p></description></item></channel></rss>