Run like Hell: Warehouse

Showing posts with label Warehouse. Show all posts

Dec 30, 2023

LinkedIn: Lakehouse Analytics - with Microsoft Fabric and Azure Databricks

Today i came across a posting in linkedin.com which points to this nice booklet:

The linkedin posting pointed to site, where you can register for the a PDF, which contains 20 pages and 7 chapters.

Chapter one is a very short one (only half of a page): A typical introduction about data, information, analytics and why this is important :)

In chapter 2 the lakehouse architecture is explained. I liked the phrase "It combines [...] traditional data warehouse with the massive scale and flexibility of a data lake". This phrase combined with a very good table of the differences between a data warehouse and a data lake is from my point of view an excellent definition.

"Data management and analytics with Microsoft Fabric and Azure Databricks" is the title of the third chapter. This chapter only emphasizes that Fabris and Databricks can work seamlessly together and Microsoft introduces a OneLake to simplify the integration of these tools.

Chapter 4 i can not really summarize here. But there is really a cool figure in that chapter. Here only a part of that:

The Databricks part is missing and some other parts as well, but in the new Microsoft approach Fabric consists not only of storage - even PowerBI is a part of that new powerful tool. (one subsection is about AI integration)

The next chapter "Code faster with GitHub Copilot, Visual Studio Code, and Azure Databricks" is about the demonstrating "the power of Azure Databricks as a leading platform for data and AI when combined with developer tools such as Visual Studio Code and GitHub Copilot". This is like a small walkthrough how to configure Visual Studio Code.

In the seventh chapter a step by step guide is provided for integrating Databricks with OneLake.

In my eyes chapter 4 is the key of that booklet, for everyone who wants to know, how the terms Fabric, OneLake, Databricks, Lakehouse are related and how the big picture looks like. Anyone who analyzes data with Microsoft should have read this.

Feb 20, 2023

LinkedIn: A Guide to Data Governance - Building a roadmap for trusted data

On linkedin from "The Cyber Security Hub" shared a nice booklet about data governance:

An like always: It is only a booklet with about 25 pages - so this is not really a deep dive into this topic, but it gives you a good overview and of course a good motivation:

These include the need to governdata to maintain its quality as well as the need to protect it. This entails the prerequisite need to discover data in your organization with cataloguing, scanning, and classifying your data to support this protection.

and if this is to abstract, you should consider the following use case (and i think this use case has to be considered):

However, for AI to become effective, the data it is using must be trusted. Otherwise decision accuracy may be compromised, decisions may be delayed, or actions missed which impacts on the bottom line. Companies do not want ‘garbage in, garbage out’.

The booklet contains the sections "Requirements for governing data in a modern enterprise", "components needed for data governance", "technology needed for end-to-end data governance" and "managing master data". All sections do not provide a walk through for achieving a good data governance, but there are many questions listed, which you should answer for your company and then move forward.

If you already have a data governance in place: This book is a good challenge for your solution. And for sure you will find some points, which are missing :)

Dec 12, 2007

Official Oracle Wiki: Helpful?!?

Oracle started some weeks ago their official wiki.
But is this wiki really helpful? Who should contribute to this web 2.0 thing?

For example read a blog entry about the OWB wiki pages (Oracle Warehouse Builder).

Wikis-- especially corporate wikis-- are a tricky business. On the one hand, users and corporate staff can share information and let the wisdom of crowds emerge. On the other hand, a company can pitch up a wiki, call that their support strategy, and in essence tell their customers "Our product documents and supports itself. Isn't Web 2.0 fun?"

If you look there you will find many threads, which should be in forums.oracle.com.
Really helpful information? Not yet....

But web 2.0 applications need their time to find their scope. So let's wait or better contribute to make the wiki successful!

Dec 9, 2007

Partitioning & Transportable Tablespaces

Here an example how to use in an Oracle database partitioning together with transportable tablespaces:

Create Tablespaces

To create tablespaces, put the following lines into a file createTablespace.sql

CREATE SMALLFILE TABLESPACE "&1" DATAFILE '/opt/oracle/oradata/XYNADB/datafile/&1.dbf'
SIZE 100M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING
EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ;
exit;

and start the script via sqlplus with sqlplus system/pwd @ createTablespace.sql TTS01.
It will create a tablespace named TTS01. Repeat this step with TTS00 and TTS02.

Create Table

To create a partitioned table enter the follwing lines:

create table parttest (
id number, name varchar2(40), payload varchar2(40))
partition by range (id)
(partition part00 values less than (100) tablespace tts00,
partition part01 values less than (1000) tablespace tts01,
partition part02 values less than (10000) tablespace tts02);
exit;

Insert Data

Now put into the table parttest any data you want (the id has to be lower than 10000).
Just to verify, that the partitioning is working, use the following command:

select * from parttest partition (part01);

Adding new Partitions

To add new partitions, you have to create a new tablespace (s. abouve) and then enter the following command:

alter table parttest add partition part03 values less than (20000) tablespace tts03;

Drop an old partition

To drop an old partition, use the follwing command:

alter table parttest drop partition part00;

But with issueing this command your data in this partition is gone and there is no rollback!

Anything new?

Now we have a table partitioned over several tablespaces. We can add partitions and drop them.
The advantage is, that performance is better than with one unpartitioned table because the optimizer uses the partitioning to get faster access to the data.
But if you could move the partitions without dropping them, that would be a real enhancement.
Here the commands how you can:

Moving the data

First you have to create a table with the same structure as your paritioned table (but without the partitions) on the same tablespace where your data resides, which you want to move.
After that you can exchange the data of your partitioned table with the data of the new table.

alter table parttest exchange partition part01 with table parttestnew;

After that you can drop the partition on the original table parttest without loosing your data, which is still there in the table parttestnew. And if you want to can move the partition with the transportable tablespace feature from Oracle to another instance…

At GIP these features are used in the Xyna Service Warehouse (take a look at my guest post there!).