Skip to content
Advertisement

SQL code patterns mapped to Matlab operations on `table’s`

I’m quite comfortable (and somewhat critically dependent) on Microsoft Access (SQL 92) code idioms patterns, including nested SELECTs, various JOINs, filtering, column ordering, and aliasing of columns and tables.

I’m starting to feebly finding my way around Matlab’s equivalent operations on tables.

Other than the examples shown for the individual commands, is there a (possibly small) compendium of SQL code patterns and equivalent Matlab code? In case it matters, I’m using Matlab 2015b.

Other than code patterns, I am also hoping to get savvy at issues like type conversion when applying aggregation operations on table records. I’ve deliberately used uint16 to contain the memory footprint of my table because I know the value range of the fields are limited. Having never worried about basic type control before, however, I am only now discovering gotchas, like no automatic type promotion when the (possibly intermediate) results of an expression evaluation exceeds the range that is supported by the operand types. I don’t recall ever having to worry about this is SQL, but if there are any advised practices to handle or skirt these issues in Matlab, thanks for pointing me to them.

Advertisement

Answer

Tables are the closest thing to a SQL dataset you can find in Matlab. The first thing you should do is to read this official documentation page very carefully, because it teaches you the basics for accessing and manipulating data in a table object.

Once it’s done, you will be halfway to your goal. A few good examples and some training will finally make you reach the peak. Let’s take this table as reference for the upcoming cheatsheet:

load patients; %% Load default data...
patients = table(Age,Gender,Height,Weight); %% Create the table...

SELECT

Basic SELECT statements can be performed using the accessing approaches shown in the official documentation.

SQL    => SELECT * FROM Patients
Matlab => patients

SQL    => SELECT Gender, Height FROM Patients
Matlab => patients(:,{'Gender' 'Height'})

SQL    => SELECT Gender, Age FROM Patients LIMIT 5
SQL    => SELECT TOP 5 Gender, Age FROM Patients
Matlab => patients(1:5,{'Gender' 'Age'})

Be careful with variable assignments when performing selections. If you assign the result of a selection to the variable that is being used for storing the dataset, then the original dataset is replaced and lost forever:

patients = table(Age,Gender,Height,Weight); %% Dataset is created...
patients = patients(1:5,{'Gender' 'Age'}); %% Dataset is replaced...

In order to avoid this, store the result of your selections into another variable, as follows:

patients = table(Age,Gender,Height,Weight); %% Dataset is created...
patients_selection = patients(1:5,{'Gender' 'Age'}); %% Dataset is not replaced...

Nested SELECT statements or SELECT ... FROM (SELECT ... FROM ...) structures are just distinct selections performed on intermediate results (signally, variables that store the intermediate results).

DISTINCT

The DISTINCT clause for selections can be performed using the unique function.

SQL    => SELECT DISTINCT * FROM Patients
Matlab => unique(patients)

SQL    => SELECT DISTINCT Gender FROM Patients
Matlab => unique(patients(:,{'Gender'}))

WHERE

Filters applied by WHERE clauses are a little bit trickier to obtain. You must create logical indexers based on table columns, and use them to select the proper rows.

SQL    => SELECT Height FROM Patients WHERE Gender = 'Female'
Matlab => patients(strcmp(patients.Gender,'Female'),{'Height'})

SQL    => SELECT Weight FROM Patients WHERE Age >= 30
Matlab => patients(patients.Age >= 30,{'Weight'})

SQL    => SELECT Gender, Age, Weight FROM Patients WHERE (Gender = 'Male' AND Age >= 50) OR (Gender = 'Female' AND Age <= 50)
Matlab => idx = (strcmp(patients.Gender,'Male') & (patients.Age >= 50)) | (strcmp(patients.Gender,'Female') & (patients.Age <= 50));
          patients(idx,{'Gender' 'Age' 'Weight'})

ORDER BY

The ORDER BY statement equivalent is the sortrows function.

SQL    => SELECT * FROM Patients ORDER BY Height ASC, Weight DESC
Matlab => sortrows(patients,{'Height','Weight'},{'ascend','descend'})

SQL    => SELECT Height, Weight FROM Patients WHERE Gender = 'Female' ORDER BY Height ASC, Weight DESC
Matlab => sortrows(patients(strcmp(patients.Gender,'Female'),{'Height','Weight'}),{'Height','Weight'},{'ascend','descend'})

GROUP BY

Grouping statements can be achieved using the findgroups function, and data aggregation (for example AVG and SUM) can be performed using the splitapply function coupled with the former.

SQL    => SELECT Gender, SUM(Weight) AS 'TotalWeight' FROM Patients GROUP BY Gender
Matlab => [groups,Gender] = findgroups(patients.Gender);
          TotalWeight = splitapply(@sum,patients.Weight,groups);
          table(Gender,TotalWeight)

SQL    => SELECT Gender, Weight, AVG(Height) AS 'AverageHeight' FROM Patients GROUP BY Gender, Weight ORDER BY Gender ASC, Weight ASC, AverageHeight ASC
Matlab => patients40 = patients(patients.Age <= 40,:);
          [groups,Gender,Weight] = findgroups(patients40.Gender,patients40.Weight);
          AverageHeight = splitapply(@mean,patients40.Height,groups);
          sortrows(table(Gender,Weight,AverageHeight),{'Gender' 'Weight' 'AverageHeight'},{'ascend' 'ascend' 'ascend'})

COUNT

The total number of rows in a table can be retrieved using the height function.

SQL    => SELECT COUNT(*) FROM Patients
Matlab => height(patients)

SQL    => SELECT COUNT(*) FROM Patients WHERE Gender = 'Male'
Matlab => height(patients(strcmp(patients.Gender,'Male'),:))

JOINS, INTERSECTIONS & UNIONS

Joins can be performed using the built-in join, innerjoin and outerjoin functions. Intersections and unions can be performed, respectively, using the intersect function and the union function. Unions can also be performed by concatenating two tables (provided they have the same number of columns and the same column names) using the following syntax:

MyTable3 = [MyTable1; MyTable2];

The examples provided in the links should be easy enough to understand and reproduce.

INSERT, DELETE & UPDATE

In this official documentation page you should find all what you need in order to perform insertions and deletion of rows in tables. Insertions can be quickly performed using concatenation:

SQL    => INSERT INTO Patients (Age,Gender,Height,Weight) VALUES (32,'Female',182,72)
Matlab => patients = [patients; {32,'Female',182,72}]

While deletions are performed on a row-wise indexing basis:

SQL    => DELETE FROM Patients WHERE Gender = 'Male' AND Age <= 35
Matlab => idx = strcmp(patients.Gender,'Male') & (patients.Age <= 35);
          patients(idx,:) = []

Updates are a little bit different to handle. You can use a direct column access together with a predefined indexing in order to perform them easily:

SQL    => UPDATE Patients SET Age = 50, Hight = 175, Weight = 70 WHERE Gender = 'Male'
Matlab => idx = strcmp(patients.Gender,'Male');
          patients.Age(idx) = 40;
          patients.Height(idx) = 175;
          patients.Weight(idx) = 70;

EDIT

A few days ago, I created a Matlab script that probably provides what you are looking for. Basically, it’s a SQL wrapper around Matlab tables. Take a look at it by visiting the following page:

http://mathworks.com/matlabcentral/fileexchange/68771-table-sql

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement