Our Forum has Moved

This site is our old forum and is only here for achive until we get proper 301 redirects setup to make Google happy.

Please use our new community site - Our Umbraco - which contains an improved forum, documentation wiki, package repository and a member locator.

Go to Our Umbraco now

Learn everything about Umbraco
3.05 Database Huge - cmsPropertyData = 392MB! Options
radu84
Posted: Thursday, November 27, 2008 7:36:17 PM
Rank: Enthusiast

Joined: 7/23/2008
Posts: 42
Location: Copenhagen
I've just used the script from Thomas, however it doesn't affect any rows in my case, and the
Code:
SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
        AND ID not in (select Nodeid from cmsDocument)
command does not return any rows for me.

My problem is as such:

1. My cmsPropertyData table is around 78 mb.
2. When I use the ClientTools from Thomas my database is said to have : 923 Content Items, 6817 Versions, 6657 Documents and 34528 Property Data.
3. Unfortunately, if I try to delete versions using the ClientTools to clear versions it works for a very short while and then an error message appears->

Quote:
Min(2) must be less than or equal to max(-1) in a Range object





@Daniel , I've just forwarded you the script from Thomas in case you haven't gotten it yet.
daniel_l
Posted: Friday, November 28, 2008 9:25:46 AM
Rank: Fanatic

Joined: 6/25/2007
Posts: 336
Location: Malmo, Sweden
radu84 wrote:

@Daniel , I've just forwarded you the script from Thomas in case you haven't gotten it yet.


I have received it from you! Thanks!

Is it me or umbraco? Umbraco Certified Professional Level 2.
hoehler
Posted: Friday, November 28, 2008 10:40:58 AM

Rank: Addict

Joined: 7/19/2006
Posts: 634
Location: Bad Homburg, Germany
radu84 wrote:

My problem is as such:

1. My cmsPropertyData table is around 78 mb.
2. When I use the ClientTools from Thomas my database is said to have : 923 Content Items, 6817 Versions, 6657 Documents and 34528 Property Data.
3. Unfortunately, if I try to delete versions using the ClientTools to clear versions it works for a very short while and then an error message appears->


Then you don't have lost documents (which is good). The new version of the ClientTools will give you more possibilities and more control over deleting versions. The most common problem of the old version is the timeout for large databases cause the complete doing is done in one call to the webserver via webservices. The new version will give you the opportunity to delete only selected versions from one one specific node or all versions by date from a node including the subnodes, aso. So please be patient, I will update the version asap.

• 2007/2008 MVP • www.thoehler.comblog.thoehler.com • Bad Homburg, Germany
radu84
Posted: Friday, November 28, 2008 11:31:32 AM
Rank: Enthusiast

Joined: 7/23/2008
Posts: 42
Location: Copenhagen
hoehler wrote:

Then you don't have lost documents (which is good). The new version of the ClientTools will give you more possibilities and more control over deleting versions. The most common problem of the old version is the timeout for large databases cause the complete doing is done in one call to the webserver via webservices. The new version will give you the opportunity to delete only selected versions from one one specific node or all versions by date from a node including the subnodes, aso. So please be patient, I will update the version asap.


Ok Thomas. Looking forward to trying the new version.

Many Thanks
Radu
neilf
Posted: Friday, November 28, 2008 2:28:51 PM

Rank: Devotee

Joined: 3/19/2008
Posts: 69
Location: London
I've discovered a bug that is a symptom of a publishing bug. This symptom is what causes databases grow huge in load-balanced enviroments.

Another side-effect of this issue in my environment was that documents could not be properly published to load-balanced servers.

If you need an immediate fix for this problem it would require a change in the umbraco.presentation.cache.CacheRefresher() class in the Umbraco source code. If anyone knows the best way to submit patches to the core development team, please let me know and I will pass on this fix.

The bug exists in Umbraco 3.0.5 and I haven't seen any forum posts about it being fixed in V4.0, so this bug will prob also affect V4.

The bug is triggered in certain situations where a document does not finish publishing properly when someone publishes from the UI and then the background process that is set up in the global.asax.cs cannot fix the situation. The problem only happens in load balanced situations.

You will know if this bug affects you by checking the following:
1. You have a load balanced enviroment
2. The following query returns 1 or more rows

Code:
select distinct nodeId, level, sortOrder from cmsDocument inner join umbracoNode on umbracoNode.id = cmsDocument.nodeId where newest = 1 and not releaseDate is null and releaseDate <= getdate() order by level, sortOrder


Here is my analysis of the code:

In the global.asax.cs file at line 62, this code sets up a timer event to run every 60 seconds:

Code:
            publishingTimer =
                new Timer(new TimerCallback(publishingService.CheckPublishing), Context, 60000, 60000);


(Note to core team, I might suggest setting the first interval after startup to a random value between 30000 and 60000 instead of 60000, otherwise any servers that start at the same time will probably compete with each other to publish. Servers can start at the same time e.g. when Application Pools are set to recycle at a specific time of day.)

In publishingService.CheckPublishing(object sender) at line 31:

Code:
foreach(Document d in Document.GetDocumentsForRelease())
                    {
                        try
                        {
                            d.HttpContext = (HttpContext)sender;
                            d.Publish(d.User);
                            library.PublishSingleNode(d.Id);
                            d.ReleaseDate = new DateTime(1, 1, 1); // Causes release date to be null
                        }
                        catch(Exception ee)
                        {
                            Log.Add(
                                LogTypes.Error,
                                BusinessLogic.User.GetUser(0),
                                d.Id,
                                string.Format("Error publishing node: {0}", ee));
                        }
                    }


GetDocumentsForRelease() runs the sql query that I quoted above

***********************************************************************************************
Notice that d.Publish() is called here before library.PublishSingleNode()
I will explain why this is important at the end
***********************************************************************************************

Which runs this code:

Code:
public static void PublishSingleNode(int DocumentId) {
            if (UmbracoSettings.UseDistributedCalls)
                dispatcher.Refresh(
                    new Guid("27ab3022-3dfa-47b6-9119-5945bc88fd66"),
                    DocumentId);
            else
                PublishSingleNodeDo(DocumentId);
        }


And in the dispatcher.Refresh() runs this code only for load-balanced servers when UmbracoSetttings.UseDistributedCalls is TRUE:

Code:
try {
                foreach (XmlNode n in UmbracoSettings.DistributionServers.SelectNodes("./server")) {
                    CacheRefresher cr = new CacheRefresher();
                    cr.Url = "http://" + xmlHelper.GetNodeValue(n) + GlobalSettings.Path + "/webservices/cacheRefresher.asmx";
                    cr.RefreshById(uniqueIdentifier, Id, _login, _password);
                }


The umbraco.presentation.cache.CachRefresher() is your soap wrapper class and it has a default constructor (line 31):
Code:

        public CacheRefresher() {
            this.Url = "http://" + System.Web.HttpContext.Current.Request.ServerVariables["SERVER_NAME"] + "/umbraco/webservices/cacheRefresher.asmx";
        }



And this default constructor is the cause of the problem! Because when the global.asax.cs timer fires and the code runs all the way to here, it is running in a background thread on the web-server. Therefore System.Web.HttpContext.Current is NULL and this will throw an exception. If anyone is reporting the problem with large databases, there will also be "Object Reference Not Set to Instance of Object" type errors in their UmbracoLog. I suggest testing if the HttpContext is null, putting try{} catch{} or commenting out that constructor line?

****************************************************************************************************
Earlier I mentioned that in publishingService.CheckPublishing(object sender) the d.Publish()
method is called before library.PublishSingleNode()....
****************************************************************************************************

The d.Publish() method succeeds every time and inserts a row into the cmsDocument table, but the PublishSingleNode() fails and throws an exception because of what I described above. Now this code runs in the background timer ON EACH WEBSERVER in the farm, so multiply the errors every 1 minute X number of servers and you will see how you get so many rows in the cmsDocument table in the database.


(EDIT)
I have tested this by artificially changing the releaseDate of a row in the cmsDocument table, after making sure I had set:

Code:

<distributedCall enable="true">
   <user>0</user>
    <servers>
    <server>localhost</server>
    </servers>
  </distributedCall>


in the /config/umbracoSettings.config file.

Final recommendation is to change the umbraco.presentation.cache.CachRefresher() constructor to:

Code:

public CacheRefresher() {
            if (System.Web.HttpContext.Current != null)
            {
                this.Url = "http://" + System.Web.HttpContext.Current.Request.ServerVariables["SERVER_NAME"] + "/umbraco/webservices/cacheRefresher.asmx";
            }
        }



Neil

drobar
Posted: Friday, November 28, 2008 2:51:38 PM

Rank: Umbracoholic

Joined: 9/8/2006
Posts: 2,020
Location: MA, USA; Cambridge, UK
Awesome bit of work here Niel! Can you please post this to codeplex as well?

cheers,
doug.

MVP 2007-2009 - Percipient Studios - Percipient Blog
neilf
Posted: Friday, November 28, 2008 3:40:11 PM

Rank: Devotee

Joined: 3/19/2008
Posts: 69
Location: London
Thanks Doug - will do.

I may point a link to this forum as I'm not sure whether the codeplex textbox would mangle the format to make it more difficult to read.

(EDIT)
Codeplex issue:
http://www.codeplex.com/umbraco/WorkItem/View.aspx?WorkItemId=19842


Neil

dp_sbbs
Posted: Monday, December 01, 2008 8:32:27 AM

Rank: Enthusiast

Joined: 9/10/2008
Posts: 25
Location: Malmoe, Sweden
Thomas

We runned the clean-up script but still has 26 items in each table when running:

Code:

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select Nodeid from cmsDocument)

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select contentNodeid from cmsPropertyData)

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select contentid from cmsContentVersion)


What should we do?
neilf
Posted: Wednesday, December 03, 2008 3:21:32 PM

Rank: Devotee

Joined: 3/19/2008
Posts: 69
Location: London
neilf wrote:
Thanks Doug - will do.

I may point a link to this forum as I'm not sure whether the codeplex textbox would mangle the format to make it more difficult to read.

(EDIT)
Codeplex issue:
http://www.codeplex.com/umbraco/WorkItem/View.aspx?WorkItemId=19842


BUMP

Not much notice taken of this one since I first raised it. I'm fairly certain its a sleeping beast for people running Umbraco on more than 1 server behind a load-balancer.

How much this affects you is proportional to how often you change content / publish pages.

I could of course fix it in every version of Umbraco I deploy but some clients prefer to stick with standard builds (less customisations lowers internal support cost and needs less supplementary documentation).

If you agree with this issue / suggested fix - vote for it at Codeplex and get it into the next build or V3 dot-release (e.g. 3.0.7)


Neil

hoehler
Posted: Tuesday, December 09, 2008 3:12:10 PM

Rank: Addict

Joined: 7/19/2006
Posts: 634
Location: Bad Homburg, Germany
dp_sbbs wrote:
Thomas

We runned the clean-up script but still has 26 items in each table when running:

Code:

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select Nodeid from cmsDocument)

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select contentNodeid from cmsPropertyData)

SELECT * FROM umbracoNode
WHERE nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972'
    AND ID not in (select contentid from cmsContentVersion)


What should we do?


Hi ?

It is annoying: I don't get any mail from this forum Brick wall .

Can you mail me a db-dump to th at thoehler dot com. So I can take a look at the db. Perhaps we missunderstood each other.

There are two different things: On the one hand you can have lost data in the db which you get via the sql you posted. On the other hand you have massive grown cmspropertydata caused by the bug mentioned by neilf (btw: great job neilf), but they are related to existing and published contentnodes.

The first case you can delete the lost rows in the database. In the second case you have to delete the old and unneccessary versions which can be done by my clienttools. (In the near future with some additional features). The problem is that the cmsPropertyData is optimized for reading and not for writing or deleting. So deleting many rows at once will cause timeouts. With the new version of the clienttools you will get more options on how to delete rows.

Thomas

• 2007/2008 MVP • www.thoehler.comblog.thoehler.com • Bad Homburg, Germany
boris.masis
Posted: Tuesday, December 09, 2008 6:10:57 PM
Rank: Newbie

Joined: 4/5/2007
Posts: 4
We've applied Niel's fix:

Change the umbraco.presentation.cache.CacheRefresher() constructor to
Code:
public CacheRefresher() {
            if (System.Web.HttpContext.Current != null)
            {
                this.Url = "http://" + System.Web.HttpContext.Current.Request.ServerVariables["SERVER_NAME"] + "/umbraco/webservices/cacheRefresher.asmx";
            }
        }


However, we still have run-away nodes creating thousands of revisions in the cmsPropertyData table. Should we run client tools with "number of items" set to 1 or is there another suggested solution?
dp_sbbs
Posted: Thursday, December 11, 2008 8:25:43 AM

Rank: Enthusiast

Joined: 9/10/2008
Posts: 25
Location: Malmoe, Sweden
hoehler wrote:

Can you mail me a db-dump to th at thoehler dot com. So I can take a look at the db. Perhaps we missunderstood each other.


Thomas, I mailed you a few days back, did you receive it?
hoehler
Posted: Thursday, December 11, 2008 9:51:03 AM

Rank: Addict

Joined: 7/19/2006
Posts: 634
Location: Bad Homburg, Germany
dp_sbbs wrote:
Thomas, I mailed you a few days back, did you receive it?

Just mailed to you right now...

Thomas

• 2007/2008 MVP • www.thoehler.comblog.thoehler.com • Bad Homburg, Germany
hoehler
Posted: Thursday, December 11, 2008 12:33:40 PM

Rank: Addict

Joined: 7/19/2006
Posts: 634
Location: Bad Homburg, Germany
Therefore that I got many mails asking for some help about the growing database I took this as reason for my first blogpost on my blog:
http://blog.thoehler.com: Understanding the umbraco database. In the next posts I will explain how to get rid of this...

• 2007/2008 MVP • www.thoehler.comblog.thoehler.com • Bad Homburg, Germany
chicotroimoihieu
Posted: Friday, December 12, 2008 4:44:36 AM
Rank: Newbie

Joined: 7/7/2008
Posts: 2
Location: USA
Finally, umbracolog table grows too fast days after days, Can you please tell me how to stop umbraco from writing records into this table?

Thank you very much!
boris.masis
Posted: Tuesday, December 16, 2008 3:38:42 PM
Rank: Newbie

Joined: 4/5/2007
Posts: 4
To answer my own question, the only we've been able to stop the growth of the cmsPropertyData table has been to comment out the publishing timer in the source code. Niel's fix did not seem to solve the problem, but after commenting out the publishing timer the runaway revisions seem to have stopped. I assume this also disables scheduled publishing, but regular publishing appears to work fine.

In umbraco 3.03 you'll find the publishing timer located in umbraco/presentation/requestModule.cs (it may be in a different place for other umbraco versions). Once you've found it, just comment out 2 lines as shown below:

Code:

//publishingTimer =
//    new Timer(new TimerCallback(publishingService.CheckPublishing), HttpApp.Context, 600000, 60000);


A real solution for the problem seems to be elusive. I've documented our troubleshooting of this issue so far at http://www.borism.net/2008/12/16/fixing-a-large-cmspropertydata-table-in-umbraco/
hoehler
Posted: Thursday, January 08, 2009 3:57:19 PM

Rank: Addict

Joined: 7/19/2006
Posts: 634
Location: Bad Homburg, Germany
Added a new post to my blog describing how to delete various versions from given nodes.

hth, Thomas

• 2007/2008 MVP • www.thoehler.comblog.thoehler.com • Bad Homburg, Germany
changlinn
Posted: Thursday, February 12, 2009 5:01:22 AM

Rank: Newbie

Joined: 6/26/2008
Posts: 11
Location: Sydney
Can you post your clean up script here hoehler, I sent you an email as well for it as I have a real run-a-way, CMS-property is up to 2gb.
changlinn
Posted: Monday, February 23, 2009 5:53:04 AM

Rank: Newbie

Joined: 6/26/2008
Posts: 11
Location: Sydney
Don't know if I should post a new thread or not, but I have a run-away Umbraco database as well. I tried the full script from your Blog post Hoehler but that didn't seem to work. Below is some more info
when I do
SELECT nodeId, COUNT(*) Number
FROM cmsDocument
GROUP BY nodeId
HAVING COUNT(*) > 2
ORDER BY Number DESC

I get:
nodeid Number
1748 93245
1755 92072
1763 91118
1065 68345
1080 78

Which shows I have a lot of problmatic nodes with multiple versions. I don't know how to remove these extrenuos versions however.

I also do the below:
Create Table #Temp(Name sysname, rows int, reserved varchar(100), data varchar(100), index_size varchar(100), unused varchar(100))

exec sp_msforeachtable 'Insert Into #Temp Exec sp_spaceused ''?'''

Select * From #Temp

Drop Table #Temp


and see the following (culled to make it easier to see the bigguns):
NAME ROWS Reserved DATA INDEX_SIZE UNUSED
umbracoLog 1083699 759368 KB 756368 KB 2824 KB 176 KB
cmsPropertyData 4276082 2039496 KB 1804272 KB 234688 KB 536 KB
umbracoNode 653 272 KB 128 KB 88 KB 56 KB
cmsPropertyType 212 48 KB 32 KB 16 KB 0 KB
cmsContentVersion 348816 15112 KB 14896 KB 8 KB 208 KB
cmsContentXml 442 1208 KB 1128 KB 16 KB 64 KB
cmsDocument 348743 47432 KB 47056 KB 232 KB 144 KB


Which shows I really have an issue with the size of the CMS property and Umbraco log tables.
Any help would be greatly appreciated, I have a couple Umbraco pages on this server and the rest are fine, and they were all setup as a copy of one original so I am a bit confused as to why this one is so screwy.
jebbushell
Posted: Monday, February 23, 2009 9:51:10 AM
Rank: Devotee

Joined: 12/2/2008
Posts: 50
Location: UK
I have a script, based on what Thomas Hoehler and Hendry Racher have done.

It does not address cause. It just deletes the data that I didn't need on my database, and that was lots. It has not screwed my db but that doesn't mean that it won't screw yours. Think of it as worth every penny you paid for it. Enjoy:

Umbraco V4 Database Cleanup

The forum has moved

This forum is no longer in use, so you can't reply to this message - please go to Our Umbraco

Users browsing this topic
Guest


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.