Find & Replace Partial Strings

I recently had a request to remove part of a text field from MySQL, JSON data but stored as text in MySQL 5.6.

Has this been Mongo, this could have been a pretty quick update of the data within an array:

  1. Find the data

    db.accounts.find({ “vendor.account_number”: { $exists: true }}, {“vendor”:1,”timestamp”:1}).forEach(printjson)

  2. Update the data

    db.accounts.update(
    { “vendor.account_number”: { $exists: true }},
    { $set:{“vendor.name”: “XXXXXX”,”vendor.account_number”: “XXXXXX”}}, {multi:true}
    )

  3. Screen Shot 2017-06-15 at 4.18.16 PM

On  MySQL, this was a bit trickier. I was able to pull out the segment to be updated using SUBSTRING.  Once I was able to identify the SUBSTRING values, I used REPLACE with SUBSTRING_INDEX for the substitution “from” values.

  • SUBSTRING_INDEX Syntax: SUBSTRING_INDEX(str,delim,count)
    • My data had variable length fields, but were consistent on the prefix for the data, and the character immediately at the end of the section:
    • SUBSTRING_INDEX(request, ‘”vendor”:{“name”:”‘, -1)
      • str = request (my column)
      • delim = “vendor”:{“name”:”‘  — name was potentially used in more locations, so the longer string
      • count = -1
        • “If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned”

  • REPLACE Syntax: REPLACE(str,from_str,to_str)
    • Since my data varied on each record, I had to create a new “from string” to match.  I set the update on the same column twice as I was running multiple modifications.
      • REPLACE(request, SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”vendor”:{“name”:”‘, -1),'”,’,1), ‘XXXXXXXX’)
        • str = request — my column
        • from_str = SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”vendor”:{“name”:”‘, -1),'”,’,1) — my string subsection, variable by row
        • to_str = ‘XXXXXXXX’ — my replacement/”to” value
  1. Find the data

    SELECT id
    , SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”vendor”:{“name”:”‘, -1),'”,’,1) AS vendor_name_part
    , SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”account_number”:”‘, -1),'”}’,1) AS account_number_part
    , request
    FROM call_logs
    WHERE request LIKE ‘%vendor%account_number%’;

  2. Update the data

    UPDATE call_logs
    SET request = REPLACE(request, SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”vendor”:{“name”:”‘, -1),'”,’,1), ‘XXXXXXXX’),
    request = REPLACE(request, SUBSTRING_INDEX(SUBSTRING_INDEX(request, ‘”account_number”:”‘, -1),'”}’,1), ‘XXXXXXXX’)
    WHERE request LIKE ‘%vendor%account_number%’;

Before the update:

Screen Shot 2017-06-15 at 4.50.45 PM

After the update:

Screen Shot 2017-06-15 at 4.51.36 PM

Group & Count Objects by Day – Mongo

As a SQL DBA, I have some common queries I use to get by certain information, organized by date. I’m starting to do a bit more work with Mongo, and it can be a bit frustrating trying to translate what I can do with SQL into Mongo.

Here is one I figured out recently, I thought it may be useful for others. I want to get a count of items for a certain period of time, grouped & ordered by date.

Take this example in MySQL:

  • I’m formatting the CreationDate in the select, which will be used by the Grouping & Ordering. This parses out the additional date fields that would affect the group by (e.g. 7/1/15 10:51 and 7/1/15 11:51 can now be grouped)
  • Count: This is the data to be grouped by date
  • Where clause: Just get me the date since 7/1/15
  • Group by: Use the OrderDate for grouping
  • Order By: Use the OrderDate for ordering, use Ascending order (can use DESC for reverse order)

SELECT DATE_FORMAT(creationdate, '%m-%d-%y') AS 'OrderDate',
COUNT(DISTINCT O.ORDER_ID) AS OrderCount,
COUNT(OL.ORDERLINE_ID) AS 'OrderLines'
FROM ORDERLINE OL
JOIN ORDER O ON OL.ORDERID = O.ORDER_ID
WHERE O.creationdate > '2015-07-01'
AND O.DELETED = 0
AND OL.DELETED = 0
GROUP BY OrderDate
ORDER BY OrderDate ASC

Sample Data:

OrderDate OrderCount OrderLines
7/1/15 163 12553
7/2/15 160 16981
7/3/15 18 1194
7/5/15 7 2488
7/6/15 176 13328
7/7/15 148 13586
7/8/15 186 12795
7/9/15 143 17210
7/10/15 84 7088

On Mongo, I can do something similar – In this instance I’m counting the documents, but there are additional functions to sum/avg/count etc date within documents.

  • Match: In the aggregate, it’s like the Where clause: just get me the documents since 7/1/15
  • Group: This is where I’m defining the grouping criteria.  There is probably a prettier way to do this, but I’m breaking out the Year, Month & Day for each timestamp, similar to where it’s seen in the SQL Select portion.
  • Count: the data I’m grouping by date
  • Sort: Order by the _id group, 1 is ascending order, -1 is descending order.

db.Orders.aggregate(
[
{ $match :
{ timestamp :{ $gt: ISODate("2015-07-01T00:00:00.000Z")}}
},
{ $group : {
_id : { year: { $year: "$timestamp" } , month: { $month: "$timestamp" }, day: { $dayOfMonth: "$timestamp" }},
count: { $sum: 1 }
}
},
{ $sort: { _id: 1 } }
]
)

Sample data:

{ "_id" : { "year" : 2015, "month" : 7, "day" : 1 }, "count" : 956 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 2 }, "count" : 107 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 3 }, "count" : 73 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 4 }, "count" : 73 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 5 }, "count" : 73 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 6 }, "count" : 6385 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 7 }, "count" : 106 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 8 }, "count" : 3 }
{ "_id" : { "year" : 2015, "month" : 7, "day" : 9 }, "count" : 2 }