Focal Point
SQL Join and NULL Values

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/1381057331/m/8591020292

July 18, 2008, 04:19 PM
Dan Pinault
SQL Join and NULL Values
I posted this in the WebFOCUS forum first but thought perhaps this is a more appropriate place...

Hello,

I am trying to build a dimensional model of some data that exists in a single table. Here is the process I've been following and the results I am getting...

A) Create dimension (we'll call it DIM1) from a collection of fields in the source table (we'll call it SOURCE). I am selecting distinct values and also adding a counter(we'll call it DIM1_KEY) Note: some of the records/fields in SOURCE contain NULL values.

B) SOURCE contains 107,000 records and DIM1 contains 3,700 records.

C) Looking at the data in DIM1 you can see that there are NULL values in the same records/fields as in SOURCE.

D) When I do an Inner Join with DIM1 as the host and SOURCE as the cross-reference the resulting data output only has 8,400 records and DIM1_KEY only has 178 distinct values.

E) If I recreate SOURCE and set MISSING=OFF (remove the NULL values) and repeat steps A-D the resulting data output contains the expected 107,000 records with 3,700 distinct values for DIM1_KEY.

My questions are...

1) Is this expected behavior?

2) Removing the NULL values from SOURCE may not be an acceptable solution. Is there a way to configure the join so it will work with the NULL values?

Thanks,

Dan


7.7.05M/7.7.03 HF6 on Windows Server 2003 SP2 output to whatever is required.
July 21, 2008, 11:48 AM
Dan Pinault
Turns out I got an answer from the WebFOCUS forum anyway...

------------
Dan

The problem is at the SQL join level. The problem is that null in any side of an expression yields false in SQL.

Thus on an equijoin "null eq null" is always false and thus nulls on either side of the join expression are never returned. This article explains it well.

http://msdn.microsoft.com/en-us/library/ms190409.aspx

The solution

1. code an SQL passthru with an ON expression that eliminates the nulls from both columns

FROM SOURCE T1 JOIN INNER DIM1 T2 ON NVL(T1.DIM1_KEY,99999) = NVL(T2.DIM1_KEY,99999)

This replaces the null values with 99999 using the Oracle function NVL (prob dialect dependent)
and allows the equality test to work.

2. Copy the tables over to wf and use wf join. wf chose the sensible option with null in that it treats it as just another value and so null = null returns true.

The left join argument provided in the Microsoft article is a bit of a red herring in that it suggests falsely that the deliberate reason the inner join does not work is that it is difficult to distinguish the results from a left outer if it does not work this way.

This message has been edited. Last edited by: hammo1j, July 21, 2008 06:48 AM
------------

Dan


7.7.05M/7.7.03 HF6 on Windows Server 2003 SP2 output to whatever is required.
July 21, 2008, 01:58 PM
Dan Pinault
FYI...

For those using MS SQL the function is

ISNULL ( check_expression , replacement_value )

If check_expression is NULL the function will return replacement_value else it will return check_expression. Note that replacement_value must conform to the same data type as check_expression.

Regards,

Dan


7.7.05M/7.7.03 HF6 on Windows Server 2003 SP2 output to whatever is required.
July 23, 2008, 02:01 PM
Dan Pinault
Here is a further update. ISNULL worked OK in a DevStudio procedure but it does not work in a Data Management Console Data Flow procedure.

The fine folks at tech support suggested I use the COALESCE function instead. That works. In fact, the text editor recognizes it and colors it red.

Here is the syntax:
COALESCE(check_expression,replacement_value)

In my example I used 9876 as the replacement_value. For fields with a numeric format I could simply use 9876 but for fields with a text format I had to put the 9876 is single quotes as '9876'

So,
COALESCE(text_field,'9876') or
COALESCE(numeric_field,9876)

With this function you can actually use as many array items as you want. It simply returns the first one with a Non-NULL value. For example you could use COALESCE(home_phone, cell_phone, 5555555). If home_phone has a value that is what you get, if home_phone is null you get cell_phone, if both are null you get 5555555, etc...


7.7.05M/7.7.03 HF6 on Windows Server 2003 SP2 output to whatever is required.
August 14, 2013, 01:16 PM
cs_source
quote:
COALESCE(check_expression,replacement_value)


Hi, so i have stumbled upon this problem myself and i'm wondering how you got the Data Migrator piece to work? i have attempted the colesce function but can't seem to get it going.... below are my 2 tables as well as the join as i understand it:

Table A
Name: John
Location: Canada
Name: WebFocus
Location:

Table B
Location: Canada
Account: 123456
Location: NULL
Account: 99999

COALESCE(T1.Location , 'NA' ) = T2.LOCATION

I do get an error when i run this but it does take time to process


WebFocus 8.02, SQL Server 2008r2
August 15, 2013, 10:10 AM
Clif
While I think the term "best practice" is overused because "best" is subjective but for loading dimension tables I would say it's best to avoid NULL values at least for character columns and always load some character value such as "NA" instead.

For one thing NULL is never equal to NULL or anything else. So in cs_source's example you should have a COALESCE on both sides of the equal sign.


N/A