- Home
- LLBLGen Pro
- Architecture
A SQL Challenge...
Joined: 23-Apr-2004
Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago.
Now that you have seen and appreciated the photo... Let's get technical
I've been trying to come up with an efficient way to extract similar information you are seeing on this page from a DB. The page contains an item, in this case, an image whose id is presumably "2211583" and the image belongs to an account "gen". The page looks up the image and account based on the 2 pieces of information, ID and account name, and displays the results...
This is all very well, but what is interesting to me, is the "Gen Kanai's photostream" where you can see the "previous" and "next" images from the account's "photostream". A photostream in the case of flickr is simply all the photos that Gen has ever uploaded in date order. For Gen, this list contains only 50 items, but it could obviously grow very large. What's also interesting is that the previous and next images are personalised for the user visiting the site, meaning that different users will see different previous and next images depending on what permissions they have. Some images are publically viewable, some only by friends, some only by family and some are private to Gen...
So here comes the challenge: "How does flickr find the previous and next images to display based on the ID of the main image?"
This doesn't necessarily apply to images, it could apply to, for example, "reservations": Fetch me the reservation whose ID is 1234 along with the 3 previous and 3 next reservations which were made by a person of the male sex...
At first I was thinking that this was simply a paging problem. In the case of the flickr example the page size is 3 but I'm not sure what the selection criteria would be given you only have the ID of an image which appears on that page...
I have yet to solve this problem in an efficient way (i.e. no linked lists, using a single query etc) and thought I would throw it open to see if anyone knew of a standard way to implement it...
Marcus
Joined: 15-Apr-2004
Marcus wrote:
Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago.
Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.
Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.
Fishy
Joined: 05-Aug-2005
Within the database there has to be some way to sequence the entities. Whether it is a reservation date for reservations or a sequence number for photos. Then you can query on the sequencing field in addtion to any other fields.
To get Next SELECT Top 1 * FROM Photos WHERE Seq > @SegofCurrent AND Permission = @Whatever AND ...
To get previous SELECT Top 1 * FROM Photos WHERE Seq < @SegofCurrent AND Permission = @Whatever AND ...
I think that would be the basic idea. But this will only work if you have a field in the database which gives you some type of rownumber equivelent.
This is simmilar to paging. Cause, alot of times in paging you are creating a surrogate sequence number where one doesn't exist based on your filter and sort. Once you have your records numbered you can pull out the specific page of data.
In the case above your paging is 1 item per page, and you are displaying 3 pages at once.
BOb
Joined: 26-Oct-2003
Fishy wrote:
Marcus wrote:
Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago.
Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.
Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.
Fishy
Lol, Fish.
Jeff...
Joined: 07-Apr-2004
Fishy wrote:
Marcus wrote:
Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago.
Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.
Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.
Fishy
Fishy , You know i couldn't stop gigling about this for about 5 minutes.
Joined: 04-Feb-2004
There are various artificial intellegence algorythms that can do this. for example, some games that simulate decision making use feed forward neural networks to this kind of thing. I dont know that a feed forward neural net would be the best fit for your scenario, but retailers do the same types of studies, simply look at amazon.com. They make product upsells using demographics and statistics, etc.
it is feasable that the account id of the person doing the viewing and the id of the image are linked to various categories. the engine would compare tidbits from the account demographics to come up with suitable pictures to display, etc. etc.
Typically you need to have trained data cubes and models before you could tackle this using a standard.
Joined: 23-Apr-2004
Thanks everyone (especially Fishy... Finding Nemo is now on my list of must see dvds )!
Bod I've thought about it and agree that there doesn't seem to be a faster way to do without using 3 queries... Devildog, I'm a big fan of neural nets, but I think it would be overkill for this situation.
Thanks everyone once again!
Marcus
Joined: 29-Mar-2005
Hi,
May be the do the following:
A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.
Joined: 23-Apr-2004
Rogelio wrote:
A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.
Thanks for your suggestion, however I'm not sure how you could get rows adjacent to the row whose ID you are looking for using this method... SQL only gives "sets" of results and because the where clause is filtering the resultset by an ID (which bears no relation to the ordering of the resultset), you are not able to get rows either side of the target.
Joined: 22-Feb-2005
I'm assuming SQL Server for this. Oracle and SQL (the two databases I am most familiar with) handle TOP/ROWNUM with very different syntax, so I assume that each database would require a different solution.
If you only need the next and previous item, it's pretty easy to do in a subquery (northwind for all of these):
--select the ordernum, which would be passed in as a param
DECLARE @ordernum int
SELECT @ordernum = 10256
--get the freight of the selected record, plus the freight for
--the chronologically next and previous record for the same employee.
--You would have to add a user rights filter to the subqueries
--which are hard to define without knowing how security works
SELECT
(
SELECT TOP 1
freight
FROM
orders o1
WHERE o1.orderdate < o.orderdate
AND o1.EmployeeID = o.EmployeeID
ORDER BY
orderdate desc
) as previous,
freight,
(
SELECT TOP 1
freight
FROM
orders o1
WHERE o1.orderdate > o.orderdate
AND o1.EmployeeID = o.EmployeeID
ORDER BY
orderdate asc
) as next
FROM
orders o
WHERE
orderid = @ordernum
Not a pretty solution, though. The number of items you return for prev and next is hardcoded, plus the resultset is flat.
You can't use use "TOP @x" in SQL Server, where @x is the number of rows you want. You could actually use this method by creating a SQL string and EXECuting it, but I am not a fan of this practice. If you did go this route, you could use three TOP queries UNIONed together.
The only way I can think to do it dynamically and elegantly is to use a temp table:
--select the ordernum, which would be passed in as a param
DECLARE @ordernum int
SELECT @ordernum = 10256
--select the number of rows before and after
declare @PrevNextCount int
SELECT @PrevNextCount = 5
--set the rowcount to limit the resultset for each query
SET ROWCOUNT @PrevNextCount
--explicitly creating the table with IDENTITY here, to avoid
--the case where the selected record has the same date as records before or
--after it, and thus comes back in the final query in the wrong spot
CREATE TABLE #mytemp
(
rowId int IDENTITY(1,1),
orderId int,
freight money
)
--insert prev records into temp table
INSERT INTO #mytemp (orderId, freight)
SELECT
o.orderid,
o.freight
FROM
orders o
WHERE
o.orderdate < (SELECT orderdate FROM orders where orderId = @ordernum)
AND o.employeeId = (SELECT employeeid FROM orders WHERE orderid = @ordernum)
ORDER BY
orderdate DESC
--insert "main" record
INSERT INTO #mytemp (orderId, freight)
SELECT
o.orderid,
o.freight
FROM
orders o
WHERE
o.orderId = @orderNum
--insert next records into temp table
INSERT INTO #mytemp (orderId, freight)
SELECT
o.orderid,
o.freight
FROM
orders o
WHERE
o.orderdate > (SELECT orderdate FROM orders where orderId = @ordernum)
AND o.employeeId = (SELECT employeeid FROM orders WHERE orderid = @ordernum)
ORDER BY
orderdate ASC
--reset the rowcount to 0 (no limit)
SET ROWCOUNT 0
--return the contents of the temp table
SELECT orderId, freight FROM #mytemp ORDER BY orderId ASC
--drop the temp table
DROP TABLE #mytemp
This could obviously be cleaned up or supplemented to support the app tier.
Re-reading your original question, I'm wondering if I misunderstood the challange. Did you mean one stored proc, or (literally) one query?
Funny, I thought this was an easy problem—it turned out to be trickier than I thought.
Joined: 23-Apr-2004
Thanks for the reply Phil!
psandler wrote:
I'm assuming SQL Server for this. Oracle and SQL (the two databases I am most familiar with) handle TOP/ROWNUM with very different syntax, so I assume that each database would require a different solution.
... code removed...
If you only need the next and previous item, it's pretty easy to do in a subquery
Not a pretty solution, though. The number of items you return for prev and next is hardcoded, plus the resultset is flat.
This would be perfect only the result is flat as you say... In my case I need all the row data.
psandler wrote:
You can't use use "TOP @x" in SQL Server, where @x is the number of rows you want. You could actually use this method by creating a SQL string and EXECuting it, but I am not a fan of this practice. If you did go this route, you could use three TOP queries UNIONed together.
I believe in SQL Server 2005, parameterised TOP is now supported.
psandler wrote:
The only way I can think to do it dynamically and elegantly is to use a temp table:
... code removed...
Re-reading your original question, I'm wondering if I misunderstood the challange. Did you mean one stored proc, or (literally) one query?
Not at all! my question was open to interpretation... I was simply interested in different opinions regarding solving this problem. I think the temp table example gives a neat result but may not be performant given the use of the temp table. It does have the advantage of a single database call however...
The other way to do this would be to have the stored procedure first select the sequence column for the target orderid into a parameter and then have it return 3 separate resultsets for previous, target and next rows... This would be my preferred approach since it would eliminate the temp table and the separate resultsets give me a clean differentiator between the previous, target and next sets of rows making the client side processing easier.
The only problem is that I don't think LLBLGen supports multiple resultsets from a call to a stored procedure... This thread (http://llblgen.com/tinyforum/Messages.aspx?ThreadID=2506) seems to suggest there is a way... But I'm not sure this method would work for an SP.
Frans - Any comment on this point?
psandler wrote:
Funny, I thought this was an easy problem—it turned out to be trickier than I thought.
Me too and hence the post!
Thanks again for your input!
Joined: 29-Mar-2005
Marcus wrote:
Rogelio wrote:
A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.
Thanks for your suggestion, however I'm not sure how you could get rows adjacent to the row whose ID you are looking for using this method... SQL only gives "sets" of results and because the where clause is filtering the resultset by an ID (which bears no relation to the ordering of the resultset), you are not able to get rows either side of the target.
Hi,
I thought I have to give more details:
- The system know the user id always.
- When the user requests to see the pictures, then system retrieve the list of all the pictures the user has rights to see, this list contain the id of the pictures (and may be the route the get picture if the picture is not saved in the database). This list is sorted by id.
- From the list is easy to navigate the current picture, the previous and next: 1234567 1235678 1236789 <- previous picture 1246789 <- current picture 2124567 <- next picture 3356378 3455432 4326754 4456555 .......
To get the 3 records from database: Select picture from pictureTable where id IN (1236789,1246789,2124567) order by id
If the pictures are not saved in the database, then you use the route to get each picture.
You only need to define a rule for the first picture to show to the user, that could be the last picture the user saw. Then you save the id of the last picture in the user record.
If the user decides to see the next picture (2124567), then:
1234567
1235678
1236789
1246789 <- previous picture
2124567 <- current picture
3356378 <- next picture
3455432
4326754
4456555
.......
And so on.
If you want performance, then as you have the previous, current [1246789] and next pictures: when the user move to the next picture, then you only has to retrieve one picture [3356378] .
Joined: 23-Apr-2004
Rogelio wrote:
I thought I have to give more details:
Thanks Rogelio!
There a couple of issues with your approach which would prohibit me being able to use it in my application...
Rogelio wrote:
- The system know the user id always.
Agreed.
Rogelio wrote:
- When the user requests to see the pictures, then system retrieve the list of all the pictures the user has rights to see, this list contain the id of the pictures (and may be the route the get picture if the picture is not saved in the database). This list is sorted by id.
Unfortunately a given user may have rights to see millions of images and so pulling this data to the client is not possible (even if it were just the IDs). Also SQL Server can also sort rows much faster than the client since it may be able to use indexes...
Rogelio wrote:
If you want performance, then as you have the previous, current [1246789] and next pictures: when the user move to the next picture, then you only has to retrieve one picture [3356378] .
While performance is high priority, the application is stateless which means that nothing is kept in memory between page requests. The second request is likely to be served by a different web server in the cluster.
Thanks for you input!
Marcus
Joined: 21-Sep-2005
Marcus,
Would this do:
SELECT * FROM sometable WHERE id = (SELECT TOP 1 id FROM sometable WHERE id < @current ORDER BY id DESC) OR id = (SELECT TOP 1 id FROM sometable WHERE id > @current ORDER BY id ASC) OR id = @current ORDER BY id
Assuming you have an index on id, the minimum number of disk pages should be hit (the subqueries are executed once). The query should also be amenable to extra predicates (such as the security level checks).
Regards, Ben
Joined: 23-Apr-2004
Ben,
Thanks for the suggestion!
ianbanks wrote:
Would this do:
SELECT * FROM sometable WHERE id = (SELECT TOP 1 id FROM sometable WHERE id < @current ORDER BY id DESC) OR id = (SELECT TOP 1 id FROM sometable WHERE id > @current ORDER BY id ASC) OR id = @current ORDER BY id
I think this only works if the desired resultset is ordered by "id". If we wanted to order by another field, 'name' for instance (but still target the middle row whose id we know...), then I think we would have a problem...
Joined: 21-Sep-2005
Marcus wrote:
Ben,
...
I think this only works if the desired resultset is ordered by "id". If we wanted to order by another field, 'name' for instance (but still target the middle row whose id we know...), then I think we would have a problem...
Hmm, how about:
SELECT * FROM SomeTable WHERE Id = (SELECT TOP 1 Id FROM SomeTable WHERE Name < (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) ORDER BY Name DESC) OR Id = (SELECT TOP 1 Id FROM SomeTable WHERE Name > (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) ORDER BY Name ASC) OR Id = @current ORDER BY Name
Of course, you didn't say that Name wasn't unique. grin
Regards, Ben
Joined: 21-Sep-2005
Also, replacing the next and before expressions with something like--
Id = (SELECT TOP 1 Id FROM SomeTable
WHERE Name > (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current)
OR (Name = (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current)
AND Id > @current)
ORDER BY Name ASC, Id ASC)
--should allow it to work even with duplicate names. (I haven't tested it or checked the query plan, so I'm not sure if it still uses the theoretical minimum number of page reads)...
Regards, Ben