Announcement

Collapse
No announcement yet.

Ideas for bringing data into MySQL from a flatfile

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Ideas for bringing data into MySQL from a flatfile

    Hi all,

    I have been working on a project that involves importing data from a .csv flatfile into a database and then analyzing the data (both using Mivascript). I am wondering what approach some of you all might take in comparison with what I have done thus far.

    The first version I wrote using Miva xbase database files. The method was to simply use mvimport and bring all the data in from the flatfile, scrubbing the data a bit as well on the import (to remove quotes and such). On a flatfile containing about 260,000 lines, this took ballpark ... maybe 10-15 mins for the import (on an office box using Miva Mia). Not bad. The other part of my script that runs the various analyses required another 14 mins or so to run. But, in the future this analysis part will become more complex.

    So, then I used MySQL for the backend. The nice thing with MySQL is the analyses can run in about 4 seconds instead of 14 mins! But importing the data into MySQL using mvimport with INSERT queries takes about an hour or even more. Much, much slower. Interesting.

    So overall, on the face of it it would seem importing data into Miva's native xbase DBFs is much faster than bringing it into MySQL. But I would really prefer MySQL for this because of the flexibility it gives me in accessing the data once it is there and the speed of querying the existing data. And as the analysis becomes more involved, the MySQL speed will be nice and in fact necessary.

    There are also a couple of ways to bring the data straight into MySQL from either a Windows CL or the MySQL CL Client. These are a lot faster than INSERTS. I was thinking I could easily write a batch file. But then there issues with cleaning up the data first, and also with how to streamline this all so a regular user can do everything from a web browser. I really like it to be that simple.

    What approaches have some of you all taken in situations like this?

    Regards and Thanks for any input,
    Sean Harrell
    Southland Trade Corp.

    #2
    Sean,

    It's not that importing data into mySQL is that much slower than into xbase databases, it's just that the VM is very slow at INSERT(ing) records into mySQL. Importing data into mySQL using virtually any other tool is dramatically faster, which reaffirms that there is an issue with the VM and how it talks to mySQL - not mySQL itself, which is much faster than that if you use other tools.

    Short term solution - import data using other tools, then do your analysis as usual (hey, 4 seconds sounds good!). Long term solution - someone on MIVA development should have a look at this and see what the VM is really doing when it INSERTs records into mySQL tables.

    Comment


      #3
      Yeah was wondering if it was maybe on the Miva VM side of things.

      The rate of inserts on a P4 box here is only about 50 recs/sec using Mivascript/MySQL.

      Using C:\>mysql -u user -ppwd MYDB < backedupdb.sql

      .. I can bring in 260,000 rows in about 10 seconds.. heh.
      Sean Harrell
      Southland Trade Corp.

      Comment


        #4
        Sean,

        Did you try to trigger the MySQL import via Empresa with DBCOMMAND? I haven't tried it, but I can imagine that it may work. Other than that, 50 inserts per sec is absurdly slow, you should run several 1000s per sec with Miva. At least that are results that I benchmarked on a very slow machine a long time ago. It really sounds to me as if another problem is hiding in your code or your installation (like for example that you establish new network connections for every insert, that you may have to little memory or that your CPU is too busy to run both Empresa/Mia and the MySQL server).

        From my own mistakes I have seen many times that tiny scripting inefficiencies can have a disastrous effect - and of course in your case this is amplified 260.000 times!

        Markus
        Emerald Media, Trade & Technology USA/Germany
        Professional Miva Programming since 1998
        Home of the Emerald Objects Application Server (EOA)
        Multi-dimensional CRM, CMS & E-Commerce

        http://www.emeraldobjects.com
        http://www.sylter-seiten.com

        Comment


          #5
          Hey Markus,

          No I am not using DBCOMMAND. I will look into that though. Here is the loop that does the inserts. I only open the connection at the start. I do some glosubs -- should be quick enough. Also, I am using MvIMPORT to do the loop and assign the lines of text in the CSV file to local variables -- so I can insert them as field values. Maybe that's not correct, but it is the only way I know how at the moment.

          Code:
          <mvopen
          NAME="open"
          DATABASE="pm@ntbox:3306"
          TYPE="MySQL"
          username="someuser"
          password="xxxxx">
          
          <MvIF EXPR = "{MvOPEN_ERROR}">
          Could not connect to database.<br><br><br>
          </MvIF>
          
          <MvQUERY NAME="open" QUERY="delete from main;">
          
          <MvASSIGN NAME="l.apos" VALUE="'">
          
          <MvIMPORT FILE="devdata/po_34special_transactiondetail.csv" FIELDS="l.receipt,l.datime,l.des,l.sku,l.qty" DELIMITER=",">
          
          <MvASSIGN NAME="l.receipt1" VALUE="{glosub(l.receipt,'"','')}">
          <MvASSIGN NAME="l.datime1" VALUE="{glosub(l.datime,'"','')}">
          <MvASSIGN NAME="l.des1" VALUE="{glosub(glosub(l.des,'"',''),l.apos,'')}">
          <MvASSIGN NAME="l.sku1" VALUE="{glosub(l.sku,'"','')}">
          <MvASSIGN NAME="l.qty1" VALUE="{glosub(l.qty,'"','')}">
          
          <MvQUERY NAME="open" QUERY="insert into main values (?,?,?,?,?)" fields="l.receipt1,l.datime1,l.des1,l.sku1,l.qty1">
          
          </MvIMPORT>
          Sean Harrell
          Southland Trade Corp.

          Comment


            #6
            Hi Sean,

            As far as I can tell, this looks okay. Maybe some little improvements (although they really shouldn't make much of a difference):

            When you open the database connection to a local server, try use localhost to prevent DNS lookups. Especially when you use Mia, this is often very slow.

            To clear the existing database, use TRUNCATE main instead of <MvQUERY NAME="open" QUERY="delete from main;"> TRUNCATE clears the entire table, but keeps the empty structure, "DELETE from main" deletes every record one by one (table scan). You can of course also delete the entire table and then rebuild it, which is very fast, too.

            The actual MvIMPORT and the insert statement looks good (assuming that the code that was published got cut off in the insert statement), but you may want to try to run manual commits every x-inserts (depending on your table type in MySQL). You can for example force a commit after every 1000 records, so that the buffer in MySQL doesn't use up too much memory. On the other hand committing every record may be too much. This is of course an issue of try and error...

            Usually, when entering lots of records (although 260.000 is really a piece of cake for MySQL), you may consider to run the operation without the MySQL indices, and only at the end rebuild them. When loading data warehouses, this is a common strategy. Here this is just a possible idea to try; I doubt that it'll make much of a difference.

            More important is certainly to monitor the memory and CPU usage of your computer. It may very well be that your computer swaps out to the hard drive if your memory gets too low, and then it is no wonder that it takes that long. Miva Empresa and Mia both use quite a lot of memory and CPU time, which may be lacking for your MySQL server.

            Again, you should easily be able to insert about 1000 records and more per second. Of course you will not reach the speed of a true highly optimized MySQL client written in C/C++.

            HTH,

            Markus
            Emerald Media, Trade & Technology USA/Germany
            Professional Miva Programming since 1998
            Home of the Emerald Objects Application Server (EOA)
            Multi-dimensional CRM, CMS & E-Commerce

            http://www.emeraldobjects.com
            http://www.sylter-seiten.com

            Comment


              #7
              Thanks Markus,

              I have been looking into these things. The host name for the DB is actually a server on our intranet. So I cannot really use localhost. But I can use the IP.

              With respect to DELETE from main vs. TRUNCATE -- interesting thing there is the DELETE from main query runs as fast in Mivascript as it does directly using MySQL. It only takes about a second or even less to wipe 10s of thousands of rows using DELETE. So although TRUNCATE may be more proper it is not where the big performance issue is.

              Probably the most interesing thing is the whole resource thing. I just tested this on my workstation (P4, 3.2Ghz, 512MB RAM). It barely even taxes it. CPU utilization is about 3-5% only and about 311MB RAM used. Yet still very slow on inserts.

              I will look into using the commits. Maybe that'll help. But from the very first execution of the script I am watching the record count increase in MySQL. The rate of insert is slow at the outset and remains very constant until the end. If the buffer were consuming more and more memory, I would expect a slowdown effect.

              I'll keep digging around. I might try the script using full blown Empresa on our host's server. It would be interesting if that made a large difference.
              Sean Harrell
              Southland Trade Corp.

              Comment


                #8
                I would use MySQL load. That should be the fastest way to load any data. If the data file is not on the same machine with the database, I think you can do a file upload first. After making sure they are on the same machine, you can execute load command to load the data. For example, I create a test data file called testdata.txt that contains:

                1,2
                2,3
                4,5

                I created a table called testload like this:

                create table testload (a int, b int);

                I than executed this miva script to load the data into the table:

                <html>
                <MvOPEN NAME = "test" DATABASE = "test_db@dbmachine" TYPE = "MySQL" USERNAME="username">
                <MvIF EXPR = "{ g.MvQPEN_ERROR }">
                <MvEVAL EXPR = "{ 'Error in open: ' $ g.MvQPEN_ERROR }">
                <MvELSE>
                <MvEVAL EXPR = "{ 'Successful open <br>' }">
                </MvIF>


                <MvQUERY NAME = "test" QUERY = "load data infile '/tmp/testdata.txt' into table testload fields terminated by ','">
                <MvIF EXPR = "{ g.MvQUERY_ERROR }">
                <MvEVAL EXPR = "{ 'Error in load: ' $ g.MvQUERY_ERROR }">
                <MvELSE>
                <MvEVAL EXPR = "{ 'Successful load' }">
                </MvIF>


                </html>

                I hope this helps.

                Comment


                  #9
                  Thanks so much cagdas,

                  He he. Cool. Now that's blazing fast -- too cool -- takes oh 10 seconds for 260,000 lines.

                  I had wondered about using the MySQL "load data infile" command. I knew I might have issue getting the path correct and working since the flatile is in the Miva data area (ie outside the MySQL one). But creating a read-only share in the Miva dir directory and using an absolute path works like a charm.

                  Here's the finished code that handles the importing of the uploaded file:

                  Code:
                  <MvASSIGN NAME="l.apos" VALUE="'">
                  
                  <MvASSIGN NAME="l.delfile" VALUE="{fdelete('devdata/' $ 'pmcatdata.dat')}">
                  
                  <MvIMPORT FILE="devdata/po_34special_transactiondetail.csv" FIELDS="l.receipt,l.datime,l.des,l.sku,l.qty" DELIMITER=",">
                  
                  <MvASSIGN NAME="l.receipt1" VALUE="{glosub(l.receipt,'"','')}">
                  <MvASSIGN NAME="l.datime1" VALUE="{glosub(l.datime,'"','')}">
                  <MvASSIGN NAME="l.des1" VALUE="{glosub(glosub(l.des,'"',''),l.apos,'')}">
                  <MvASSIGN NAME="l.sku1" VALUE="{glosub(l.sku,'"','')}">
                  <MvASSIGN NAME="l.qty1" VALUE="{glosub(l.qty,'"','')}">
                  
                  <MvEXPORT
                  FILE="devdata/pmcatdata.dat"
                  FIELDS="l.receipt1,l.datime1,l.des1,l.sku1,l.qty1"
                  DELIMITER="|">
                  
                  </MvIMPORT>
                  
                  <mvopen
                  NAME="open"
                  DATABASE="[email protected]:3306"
                  TYPE="MySQL"
                  username="someuser"
                  password="xxxxx">
                  
                  <MvIF EXPR = "{MvOPEN_ERROR}">
                  Could not connect to database.<br>
                  </MvIF>
                  
                  <MvQUERY NAME="open" QUERY="delete from main;">
                  
                  <MvQUERY NAME="open" QUERY="load data infile '/MivaMia/DATA/devdata/pmcatdata.dat' into table main fields terminated by '|';">
                  Sean Harrell
                  Southland Trade Corp.

                  Comment


                    #10
                    Re: Ideas for bringing data into MySQL from a flatfile

                    Can this be used to load data into specific fields...

                    ie - I have a client that needs to update 50,000 products with new pricing. Normal flatfile upload simply times out. So a product code and price.

                    Bill
                    William Gilligan - Orange Marmalade, Inc.
                    www.OrangeMarmaladeinc.com

                    Comment


                      #11
                      Re: Ideas for bringing data into MySQL from a flatfile

                      Yes you can: http://dev.mysql.com/doc/refman/4.1/en/load-data.html

                      Code:
                      LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
                      Last edited by Brandon MUS; 04-09-08, 12:32 PM.

                      Comment


                        #12
                        Re: Ideas for bringing data into MySQL from a flatfile

                        Cool.
                        thank you - I did not notice that before.
                        Excellent.

                        Bill
                        William Gilligan - Orange Marmalade, Inc.
                        www.OrangeMarmaladeinc.com

                        Comment


                          #13
                          Re: Ideas for bringing data into MySQL from a flatfile

                          But isn't that designed to delete the existing products and add in a whole new set of records into the products database? That would be different from just updating the price of existing records.
                          Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
                          Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
                          Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
                          Facebook http://www.facebook.com/EmporiumPlus
                          Twitter http://twitter.com/emporiumplus

                          Comment


                            #14
                            Re: Ideas for bringing data into MySQL from a flatfile

                            Originally posted by wcw View Post
                            But isn't that designed to delete the existing products and add in a whole new set of records into the products database? That would be different from just updating the price of existing records.
                            Hi Bill,

                            No, the LOAD datafile command adds new records.

                            In this situation, I would write a simple standalone Mivascript that imports the text file and that then updates the SQL database one by one (line by line). That should be very fast. The updates shouldn't take less than a minute.

                            Markus
                            Emerald Media, Trade & Technology USA/Germany
                            Professional Miva Programming since 1998
                            Home of the Emerald Objects Application Server (EOA)
                            Multi-dimensional CRM, CMS & E-Commerce

                            http://www.emeraldobjects.com
                            http://www.sylter-seiten.com

                            Comment


                              #15
                              Re: Ideas for bringing data into MySQL from a flatfile

                              I was looking at Sean's example which had
                              <MvQUERY NAME="open" QUERY="delete from main;">
                              just before the load data infile.

                              I still don't see how load data infile would help Bill update prices. It is used to add new records, not update a single field, right?
                              Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
                              Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
                              Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
                              Facebook http://www.facebook.com/EmporiumPlus
                              Twitter http://twitter.com/emporiumplus

                              Comment

                              Working...
                              X