Extremely large Umbraco database (5.5GB), table cmsPropertyData contains over 13.5 million records! Options
Sunday Ironfoot
Posted: Friday, April 18, 2008 1:14:39 PM
Rank: Newbie

Joined: 10/8/2007
Posts: 16
I'm trying to troubleshoot someone's Umbraco site.

The main problem is that it's running slowly and the database seems to be increadibly large (around 5.5GB). I ran some sql to find where this space is going and most of it is in the 'cmsPropertyData' table which contains nearly 14 million records (it takes up 2.5GB space alone, 1.5GB actual data). It's a fairly large umbraco instance with about 10 sites and each site is in 4 different languages (effectively four copies of each site). But does 14 million records seem a bit abnormally large to you?

I tried running the following SQL against the database...

SELECT COUNT(dataNvarchar)
FROM cmsPropertyData
WHERE dataNvarchar IS NOT NULL
AND dataNvarchar <> ''

...to get the number of records that are using the nvarchar data type (a lot of the page property nodes are using this type) and the count is 5'121'451. However, when I run this sql...

SELECT COUNT(DISTINCT dataNvarchar)
FROM cmsPropertyData
WHERE dataNvarchar IS NOT NULL
AND dataNvarchar <> ''

...(notice the 'distinct' syntax) to get unique values the count is 28'787 so it seems there's a hell of a lot of duplicate data in there. What's more it seems to keep on climbing, everytime I do a total count of that table it goes up a little more, and the database has continued to gradually increase in size. The 28'787 value seems to never go up however.

I'm completely lost as to why it's doing this and what is going on. Can anyone suggest any reason as to why it could be doing this? Areas I could investigate further? Thanks for any help!

Dominic Pettifer
mortenbock
Posted: Friday, April 18, 2008 1:24:52 PM

Rank: Addict

Joined: 7/19/2006
Posts: 789
Location: Århus, Denmark
I don't know if this is the source for your problem, but could it possibly be related to this bug, that has just been fixed in the 3.0.5 release?

Release/Expire dates can cause versions numbers to go high

There is not a lot of description on the reported bug, but if the site uses the release/expire features, then that might be it?

Umbraco saves a new record each time a document is saved/published, so if some of the documents are publishing themselves all the time, that could cause the numbers to blow up.

So maybe doing an update to 3.0.5 could help.

Also you might want to check out the Client Tools for Umbraco to do a cleanup of the existing database, to bring down the number of existing versions?

Morten Bock - Level 2 certified - MVP 2008/2009 - My danish blog with a few english posts

sjors
Posted: Friday, April 18, 2008 1:26:59 PM

Rank: Fanatic

Joined: 7/20/2006
Posts: 408
Location: Amsterdam
Don't know or its related, but a few months ago we saw a dramatic load on our (load balanced) database envirenment, we are running a cluster of 6 heavey SQL server machines, and one umbraco process did use 20% of CPU power.

After a small research we found out that one document was published over and over, and resulted in:

select count(*)as aantal, nodeId from cmsDocument
group by nodeId
order by aantal desc

aantal nodeId
1414664 2736
811279 2822
93 1272

It was caused because there was:

- 1 record Latest=1, Published=1, + older VersionId
- 1 record Latest=0, Published=0, + newer VersionId.

So every minut the update process entered new data in the database and cluthering everything up :(

After manually updating we resolved the problem, we have still 4 gigs of umbraco crap in our database ;)
imayat12
Posted: Friday, April 18, 2008 1:38:27 PM

Rank: Addict

Joined: 7/19/2006
Posts: 649
Location: Preston, UK
Ironfoot,

Try http://www.codeplex.com/ClientTools4Umbraco and maybe clean out some stuff dont forget db backup just in case Angel

Level 2 certified. If it aint broke dont fix.
Sunday Ironfoot
Posted: Friday, April 18, 2008 1:39:21 PM
Rank: Newbie

Joined: 10/8/2007
Posts: 16
Thanks for the suggestions, that does help a lot.

One thing I forgot the mention, was that data was manually imported directly into the umbraco database (via TSQL scripts) instead of via the umbraco API. I've heard somewhere that this is a bad idea. Is there a chance that doing this has caused some data to get out of sync somewhere, causing various documents and nodes to get republished over and over?

Dominic Pettifer
neehouse
Posted: Friday, April 18, 2008 8:36:43 PM

Rank: Umbracoholic

Joined: 7/20/2006
Posts: 1,074
Location: Charleston, West Virginia, United States
Ironfoot,

It is possible that if you imported using SQL scripts, that the page in question did get screwed up somehow. If it is a single node that is the culprit, you may want to try deleting that node, and adding it back through the interface. Of course, if you have links pointing to the old ID, you will need to update those.

If it is more than a few pages causing the problem, then things get a bit more tedious.

When deleting the node, you will have to delete it from the trashcan as well, but, it should allow you to clear up some db space.

• 2007/2008 MVP • 2008/2009 MVP • Certified • Licensing • Support • Development • Hosting •
jesper
Posted: Monday, July 07, 2008 12:21:47 PM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
I'm trying to clean a massive database for a 3.03 solution. The db is 8gb and the reason for this is a known bug that results in creating a new document version every minute or so.

I've tried Thomas H's client tool and it worked like supposed, removing a lot of versions. No node in the cmscontent has more than 3 versions.

But still I'm having a massive 8gb database with a table cmspropertyData containing over 30.000.000 records. I'm suspecting that cmspropertydata contains a LOT of records that isn't referenced at all since the version they relate to is long gone.

I think the reason Thomas H's client tools can't get to these since the client tools works by traversing existing node's versions.

My assumtions:

1) If table cmsPropertyData contains data related to notexisting versions then these will not be deleted.
2) Also if table cmsContentVersion contains data related to notexisting versions then these will not be deleted.

Some numbers:

cmsdocuments: 18.752 records
cmscontent: 10.995 records
cmscontentversion: 973.692 records
cmspropertydata: 31.273.844 records

If any of you are sitting with the sql script skills to clean this, then please through it after me. I'll wrap it in a usercontrol and share it immediatly.

Second best: If anyone can provide me with the exact database structure then I just might be able to create the sql statements... (I hate it but ...)

Please mail me at jesper(at)jesper.com

kindly,
Jesper

webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
jesper
Posted: Monday, July 07, 2008 2:24:03 PM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
I have tested the below statements in a healthy solution. They all returned 0 rows. In the 8GB database, the first gives me loads of rows.

-- check cmscontent

select * FROM cmscontent
WHERE NOT EXISTS
(SELECT *
FROM cmsDocument
WHERE cmsDocument.nodeId = cmsContent.nodeId
)



-- check cmscontentversion

select * FROM cmscontentversion
WHERE NOT EXISTS
(SELECT *
FROM cmscontent
WHERE cmscontent.nodeId = cmscontentversion.ContentId
)


-- check the cmspropertydata table

select * FROM cmspropertydata
WHERE NOT EXISTS
(SELECT *
FROM cmscontentversion
WHERE cmscontentversion.versionId = cmspropertydata.versionId
)


I'm that close to changing the above to delete from statements.

Any thoughts? Other tables to include?

kindly,
Jesper

webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
drobar
Posted: Monday, July 07, 2008 3:39:54 PM

Rank: Umbracoholic

Joined: 9/8/2006
Posts: 1,696
Location: KY, USA
Hi, Jesper,

Scary thought to go in and blast a bunch of data with direct sql calls... but that's probably the only solution at this point. Need I mention... BACKUPS!

In fact, any chance you can create a duplicate site with a copy of your database? Then you can run your scripts and see what happens to the site, rollbacks, the ability to create and publish new content, etc. without impacting the live site. When you're satisfied... back up the live site one more time (just to be on the double-safe-side) and do it.

Let us know how it works out!

cheers,
doug.

MVP 2007-2009 - Official Umbraco Trainer for North America - Percipient Studios
jesper
Posted: Monday, July 07, 2008 4:33:15 PM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
Hi Doug,

It's always good to mention backups. I promise that I work on a duplicate version. I executed the scripts 1 1/2 hour ago and the last statement is still running.

Kindly,
Jesper

webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
hoehler
Posted: Monday, July 07, 2008 4:45:11 PM

Rank: Addict

Joined: 7/19/2006
Posts: 597
Location: Bad Homburg, Germany
First of all: There should be no cmsPropertyData or cmsDocument or cmsContent row without a cmsConentVersion I think. Also for deleted nodes (which are in the waste) there are content and versions which are all not published. So you are right with your assumptions because the Client Tools are going from cmsVersions to the Content and the documents...

And the question is where do theses rows come from?

But I am with Doug to create backups and run the scripts...

And If this comes from a bug in 3.0.3 we should integrate these options to the Client Tools

Cheers, Thomas

• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
jesper
Posted: Monday, July 07, 2008 5:41:15 PM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
Hi Thomas,

Fantastic profile picture :-)

The database has a very very long history. V2.0 upgraded to 3.0 alpha and so forth. I think that at some point someone has tried to cleanup the database manually.

Kindly,
J

webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
hoehler
Posted: Tuesday, July 08, 2008 12:08:26 AM

Rank: Addict

Joined: 7/19/2006
Posts: 597
Location: Bad Homburg, Germany
jesper wrote:
Fantastic profile picture :-)

Yap, I love it...
jesper wrote:
The database has a very very long history. V2.0 upgraded to 3.0 alpha and so forth. I think that at some point someone has tried to cleanup the database manually.

Just check the updateDate in the cmsDocument rows which doesn't have a cmsContentVersion. Perhaps you get a glue if this problem still exists?

Thomas

PS: Why am I not getting any mail from this forum??? Brick wall

• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
jesper
Posted: Tuesday, July 08, 2008 1:56:33 AM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
Stay away from the scripts below as they do not work !

jesper wrote:


-- check cmscontent

select * FROM cmscontent
WHERE NOT EXISTS
(SELECT *
FROM cmsDocument
WHERE cmsDocument.nodeId = cmsContent.nodeId
)



-- check cmscontentversion

select * FROM cmscontentversion
WHERE NOT EXISTS
(SELECT *
FROM cmscontent
WHERE cmscontent.nodeId = cmscontentversion.ContentId
)


-- check the cmspropertydata table

select * FROM cmspropertydata
WHERE NOT EXISTS
(SELECT *
FROM cmscontentversion
WHERE cmscontentversion.versionId = cmspropertydata.versionId
)



webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
drobar
Posted: Tuesday, July 08, 2008 3:44:08 AM

Rank: Umbracoholic

Joined: 9/8/2006
Posts: 1,696
Location: KY, USA
Hi Jesper,

I take it that you didn't have success after all.

Here's an idea that came to me late in the evening... what if you took the good published content of the site and pushed it into a new database, rather than trying to clean the massive/existing database? You'd lose your rollback history, but that might be a small price to pay?

As always, let us know what you try and how it works out. You are surely not the only one who will eventually walk this path.

cheers,
doug.

MVP 2007-2009 - Official Umbraco Trainer for North America - Percipient Studios
jesper
Posted: Tuesday, July 08, 2008 9:10:44 AM

Rank: Administration

Joined: 7/25/2006
Posts: 415
Location: vipperoed, denmark
drobar wrote:

I take it that you didn't have success after all.
.


Nope .. you're quite right there. When using the above statements (changed "select *" to "delete") I found that I deleted way to much data. The result was that the media section was useless.

drobar wrote:

Here's an idea that came to me late in the evening... what if you took the good published content of the site and pushed it into a new database, rather than trying to clean the massive/existing database? You'd lose your rollback history, but that might be a small price to pay?


I wouldnt mind that approach .. my problem is that I don't know which data not to copy (yet).

Please help me validate the following. I believe that in a healthy db the following should return 0 (zero). Do you agree? Want to post your result?

select count(*) FROM cmspropertydata
WHERE NOT EXISTS
(SELECT *
FROM cmscontentversion
WHERE cmscontentversion.versionId = cmspropertydata.versionId
)

If the above is true then I would conclude that the following is valid:
(DON'T RUN THIS)

delete FROM cmspropertydata
WHERE NOT EXISTS
(SELECT *
FROM cmscontentversion
WHERE cmscontentversion.versionId = cmspropertydata.versionId
)

I'll continue today and let you know . If ( and I really hope that ) anyone (Ruben, Thomas ?) who has a clue on how to clean up "cmspropertydata" would post anything that could help (and email, since forum notifications are dead).

kindly,
J

webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
hoehler
Posted: Tuesday, July 08, 2008 9:31:41 AM

Rank: Addict

Joined: 7/19/2006
Posts: 597
Location: Bad Homburg, Germany
I have registered via 'watch this topic'. Now I have received five mails for your post, but no mail from the post from doug (really weird).

I first have to do some immediate work. After this I will take a look at our test system and will come back to you this morning.

• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
jonhoye
Posted: Thursday, October 16, 2008 5:05:41 PM
Rank: Newbie

Joined: 9/5/2008
Posts: 9
Location: Chattanooga, TN
Is there a way to disable rollback history or anything?

Also, has anyone had success finding a SQL command to scrub the history? Otherwise I'll post something as soon as I get it.

I'm working with an extremely small DB limit (Bad Discountasp.net) and need to figure out something.
Dirk
Posted: Thursday, October 16, 2008 5:51:31 PM

Rank: Addict

Joined: 9/27/2007
Posts: 977
Location: Belgium
Hi jonhoye,

Check out this thread on the same subject:

http://forum.umbraco.org/yaf_postst6564_305-Database-Huge--cmsPropertyData--392MB.aspx

Hope that helps.

Regards,
/Dirk


level 1 certified - umbraco blog at netaddicts.be - working on an integrated forum4umbraco
hoehler
Posted: Thursday, October 16, 2008 6:04:21 PM

Rank: Addict

Joined: 7/19/2006
Posts: 597
Location: Bad Homburg, Germany
imayat12 wrote:

Try http://www.codeplex.com/ClientTools4Umbraco and maybe clean out some stuff dont forget db backup just in case Angel


In this db size the ClientTool will run into timeouts. One of the problems are the indexes for this table so that deleting that many entries will run hours and will boost you database log file. I made a script for a friend removing the indexes, deleting the rows and recreating the indexes. I have it at home and will post it tonight...

Thomas

• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
Users browsing this topic
Guest


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.