Oracle

Count with Distinct

Count with Distinct


By Yuli Vasiliev

Function COUNT can be used with the DISTINCT clause, which makes COUNT consider only distinct values of the specified expression. There are several ways in which you can use COUNT with DISTINCT. This article covers them all, using as an example the part table that you will get as a result of reverse-engineering process invoked against the PurchaseOrders.dmp datastore created in ODI Studio as discussed in my previous article: Including JSON in ODI 12c.

Actually, a reverse-engineering process in ODI may be not so perfect when it comes to building a normalized relational model. Thus, reverse-engineering the content of the PurchaseOrders.dmp file, as discussed in Including JSON in ODI 12c leads to creating a set of relational tables which contain repeated data, which is also known as data redundancy. Put simply, a table may contain repeated rows, which is not always what you want when querying it.

Let’s take a closer look at the part table. First, let’s count all the rows in this table:

Note: Within ODI Studio, you can issue a custom query from within the New Query dialog, which you can invoke by clicking the New Query button in the Data panel’s toolbar.

select COUNT(*) from XSDSCHEMA.PART

which should show the following number of rows in the table:

45260

Now let’s employ COUNT with DISTINCT with the following query:

select DISTINCT COUNT(*) from XSDSCHEMA.PART

You may be surprised to see the same number of rows in the result:

45260

To understand why this is so, you will need to examine the part table structure. In ODI Studio, this can be found on the Designer tab, under Models. Just expand the PART->Attributes node to view the part’s attributes. Among others, you should see attribute LINEITEMSFK, which is the foreign key pointing to the LINEITEMS table. Looking further, you will notice that what we actually have here is a one-to-one relationship, which makes each row in the part table unique.

So, let’s now omit the LINEITEMSFK attribute when counting rows. In other words, you should include all of the part attributes in the argument expression of the COUNT, excluding only LINEITEMSFK. To accomplish that, you can issue the following query:

select COUNT(*) from (

select DISTINCT DESCRIPTION_, DESCRIPTION_ORDER, PARTORDER, UNITPRICE, UNITPRICEORDER, UPCCODE, UPCCODEORDER from XSDSCHEMA.PART

)

Or this one:

select COUNT(DISTINCT DESCRIPTION_, DESCRIPTION_ORDER, PARTORDER, UNITPRICE, UNITPRICEORDER, UPCCODE, UPCCODEORDER) from XSDSCHEMA.PART

In either case, you should see the following result of row counting:

5239

Examining the attributes, you can easily guess that UPCODE might be used for distinguishing different parts here. So, you could simply put only the UPCCODE attribute in the argument expression, as follows:

select COUNT(DISTINCT UPCCODE) from XSDSCHEMA.PART

Expectedly, this should produce the same result as previously:

5239

Conclusion

In this article, you looked at some use cases of COUNT with DISTINCT.

About the Author

Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, business intelligence (BI), databases, and service-oriented architecture (SOA). He is the author of Oracle Business Intelligence: The Condensed Guide to Analysis and Reporting (Packt, 2010) and PHP Oracle Web Development (Packt, 2007) as well as a series of other books on the Oracle technology.