What!? How could anyone say such a thing about the great CfQueryParam tag?
Well, for starters, I’m not saying don’t ever use it. In fact you should use it as much as possible because it helps protect you from SQL injection attacks and improves the performance of your queries. What I am saying is that in the situation I go into below, you really should consider avoiding it, but only if it’s safe to do so. Plus the fact that I’ll pick on some other tags and stuff too but felt like CfQueryParam deserved to take the largest chunk of the blame.
My problem with CfQueryParam is about memory leaks. While pushing ColdFusion 8 to accomplish the scenario below, the server crashed. I couldn’t really tell what had happened so I added some logging and found that memory was being eaten up pretty quickly. Now this is just one task, I’m not talking about several requests building up a memory leak over some time. One task grabbing all the memory it can get until ColdFusion says it’s got no more and crashes.
Given a CSV file, we need to read it line by line and perform several queries based on the data. We also need to do this all in one transaction, since if there is any invalid data we’ll have to rollback everything we’ve changed so far. The CSV file I used for testing had 68000 rows of data, each row having about 10 columns.
Now to parse the CSV I took advantage of some code by Ben Nadel that will convert it into a nice array. He does note that if you’re going to hold that much data in one go, expect to use up a lot of memory. So I changed things around a little. Now I was reading in the file a line at a time using FileOpen and FileReadLine, resulting in a one dimensional array of the row columns to deal with and then discard. This worked great, no memory leaks and extremely fast (thanks Ben ).
Then we add some queries and other things to the mix giving you this basic outline (not allowed to show you the actual code):
- Create an instance of a CFC.
- FileOpen the CSV
- Loop until the end of the file
- Read a single line from the file
- Parse into an array.
- Pre-process and validate some of the data we’ll be using.
- Perform some queries
- SELECT some data
- SELECT some more
- INSERT a record
- UPDATE another
- EXEC a stored procedure
- Use the CFC to write another record to the database.
- Another query to UPDATE all related records.
When things started to go wrong
Everything was written, I had a couple of test runs with smaller datasets (100 rows) to make sure everything was working correctly and I didn’t spot any issues. When I threw 68000 rows at it I knew it’d take a while, but I didn’t expect the memory leak. It made it to about 10000 records and died.
Rewrites and Memory / Speed graphs
There’s nothing like a graph to show memory leaking and here they are (I’ve limited them to 32700 iterations due to some logs being interrupted). The blue line that hits the top of the chart first is the original version of the script. You can click the image for a larger version.
The next graph shows the difference in speed between versions, not as important but nice to see. The lower the line when it ends, the quicker the version was.
Now the graphs wouldn’t make much sense if I didn’t explain the changes. Here is a description of each change and which item of the graph they are.
This is the initial template, still with its several CFQuery, CfQueryParam and CFC calls. It made it just over 10,000 records before choking the server.
Commented out all the CFQuery tags (and therfore the CFQueryParams) so we’re only using the CFC to perform a query. Made it further before crashing (18,600).
This really skipped a version but I threw again the CFC and combined the query it did with all the others. Now we have a single CFQuery per row doing a lot of SQL. It still only made it to about 25,000 until I removed the CFQueryParam tags as well. Then it got up to 56,000, almost there but we’re still leaking memory all over the place on the way.
This version made it all the way to the end. I decided to buffer the SQL so we only used one CFQuery for every 25 rows in the CSV. Initially I used a string and it caused the memory usage to bounce all over the place, so I replaced that with an array being appended each iteration of the loop. Then every 25 rows I’d ArrayToList(sqlBuffer, ‘ ‘) inside a CfQuery tag. This was the fastest working version so far but it was still leaking even if it was at a slow enough rate to be successful.
I upped the buffer to 250 rows per CFQuery. Which made little difference to speed or memory usage from Version D.
Those combined Queries that I was appending to the array seemed like a perfect excuse for a stored procedure. So I packed up all the SQL and palmed it off on the Database (MSSQL) and ran it from there instead. Initially I called this using CFStoredProc per row but this leaked all over the place. I guess that CFProcParam might suffer from the same problem CFQueryParam does. I went back to my buffer SQL (per 25) method but with an “EXEC sp_csvimport” each row, passing in the data as arguments (no cfqueryparam’s).
Not only did it complete the job, memory was STABLE! Looking at the logs I didn’t see a upwards trend, infact ColdFusion decided to garbage collect some of the memory back!
This was the same as Version F but instead of pre-processing the data in ColdFusion, I added some SQL to the stored procedure to do the job. Memory usage was the same but it was a lot slower at the task.
Version H (The final one! I promise)
Finally I had reached my goal with Version F:
- No leakage trend evident in the data / graph.
- Fastest version so far.
The only change I made was the number of rows per CfQuery. After playing around with different values I settled on every 100, which gave me the fastest so far and completed the job in 66 minutes for 68000 rows of data with no evidence of a memory leak.
Blimey, what a journey that was! There are probably other ways around all this, like pre-processing the CSV itself and then using a tool to import it into the database then process the data using the stored procedure. But I was up against a deadline and didn’t have time to rewrite the whole process. Thanks to Ben Nadel for the CSV to Array code, that was a great help. But the moral of the story is that even ColdFusion has it’s limits and certain actions can leak (CFQuery, CfQueryParam, CFStoredProc, CFProcParam and CFC calls) but are usually collected by the garabage routine. It you go and do things on a big scale like this you have to work around those leaks otherwise you’re not going to make it.
So carry on using all those tags mentioned, especially CFQueryParam (security is important!), but if you do ever run into a similar problem on the same scale. See what data you can trust and try and buffer those database calls.