Heavily interconnected Db that isn't a graph? /u/NoApparentReason256 Python Education

follow up on this: https://www.reddit.com/r/learnpython/comments/1fu5moc/comment/lpzdmxf/?context=3

So I’m trying to create a database for diseases, which naturally relates diseases with similar symptoms. I want the system to be able to tie things together naturally – I’m running into issues where I can use a join table to give each disease its symptoms, pathophysiology, and treatments (because each is many-to-many), but when I want to more specifically discuss each symptom is a properly granular way, it gets messy fast due to various ‘qualifiers’ – Time, intensity, adjectives

Ex. Coughs can be brief, frequent, episodic – bloody, productive, dry, etc.

In the thread above, I got the tip to create composite keys, but the issue is I’d like to store additional information which can be used by my code in this. i.e. if ‘bloody’ -> give the patient anemia, or make the patient complain about coughing up ‘pink frothy’ stuff. It seems like composite keys will take away the ability to give each descriptor its full information set that the subsequent code would take advantage of.

Another dimension is location. Some diseases only hit big vessels, others small. Some diseases cause marks on the legs, while others on the palms of the hands. I don’t think its feasible to create a hundred column table with nulls covering most of it.

Finally, there is a probability dimension. For example, Asthma might have a 30% chance of a cough, vs Pneumonia might have 60% chance.

On this thread – https://www.reddit.com/r/Database/comments/1fwguue/the_best_approach_for_a_heavily_interconnected_db/, I got some tips but it doesn’t sound like the responders understood the thing I was concerned about. Most people pushed for a single table, but that doesn’t seem feasible to me for the following reason:

“different diseases have different numbers of each trait (some have 1 symptom, some have 5, same for treatments and tests). Once I break off a symptom table, it seems like the granularity of the symptom descriptions available leads to a combinatorial explosion, where each symptom can have a number of different modifiers, same for underlying pathophysiology. And each row will have a small sublet of columns which are relevant for them, leading to a strange and sparse table.

Is there a smarter way to have these core entities isolated and concatenated only when necessary in processing? I’m afraid of ending up with big strings with spacers and needing to use a regex or something alto parse it intelligently before feeding it into the code. Like pneumonia having “Cough_productive_chance70_intermittent_monthsCourse”. This seems like it could get unwieldy quite fast. I save some machine learning pipelines this way and it is painful.”

I’d really appreciate any tips. if the singular way to get around this is a graph db, then I’ll tackle it, but I guess I was hoping for alternatives. Somethings I’ve bumped into but can’t really assess for whether they are appropriate:

For reference, my end goal is a webapp + mobile app which has an offline mode. Not sure how this informs things but want to put it out there.

https://github.com/dpapathanasiou/simple-graph

https://github.com/arturo-lang/grafito?tab=readme-ov-file

Are these good for my ends? Thanks a lot for the help!

submitted by /u/NoApparentReason256
[link] [comments]

r/learnpython follow up on this: https://www.reddit.com/r/learnpython/comments/1fu5moc/comment/lpzdmxf/?context=3 So I’m trying to create a database for diseases, which naturally relates diseases with similar symptoms. I want the system to be able to tie things together naturally – I’m running into issues where I can use a join table to give each disease its symptoms, pathophysiology, and treatments (because each is many-to-many), but when I want to more specifically discuss each symptom is a properly granular way, it gets messy fast due to various ‘qualifiers’ – Time, intensity, adjectives Ex. Coughs can be brief, frequent, episodic – bloody, productive, dry, etc. In the thread above, I got the tip to create composite keys, but the issue is I’d like to store additional information which can be used by my code in this. i.e. if ‘bloody’ -> give the patient anemia, or make the patient complain about coughing up ‘pink frothy’ stuff. It seems like composite keys will take away the ability to give each descriptor its full information set that the subsequent code would take advantage of. Another dimension is location. Some diseases only hit big vessels, others small. Some diseases cause marks on the legs, while others on the palms of the hands. I don’t think its feasible to create a hundred column table with nulls covering most of it. Finally, there is a probability dimension. For example, Asthma might have a 30% chance of a cough, vs Pneumonia might have 60% chance. On this thread – https://www.reddit.com/r/Database/comments/1fwguue/the_best_approach_for_a_heavily_interconnected_db/, I got some tips but it doesn’t sound like the responders understood the thing I was concerned about. Most people pushed for a single table, but that doesn’t seem feasible to me for the following reason: “different diseases have different numbers of each trait (some have 1 symptom, some have 5, same for treatments and tests). Once I break off a symptom table, it seems like the granularity of the symptom descriptions available leads to a combinatorial explosion, where each symptom can have a number of different modifiers, same for underlying pathophysiology. And each row will have a small sublet of columns which are relevant for them, leading to a strange and sparse table. Is there a smarter way to have these core entities isolated and concatenated only when necessary in processing? I’m afraid of ending up with big strings with spacers and needing to use a regex or something alto parse it intelligently before feeding it into the code. Like pneumonia having “Cough_productive_chance70_intermittent_monthsCourse”. This seems like it could get unwieldy quite fast. I save some machine learning pipelines this way and it is painful.” I’d really appreciate any tips. if the singular way to get around this is a graph db, then I’ll tackle it, but I guess I was hoping for alternatives. Somethings I’ve bumped into but can’t really assess for whether they are appropriate: For reference, my end goal is a webapp + mobile app which has an offline mode. Not sure how this informs things but want to put it out there. Graph related: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python https://github.com/dpapathanasiou/simple-graph https://github.com/arturo-lang/grafito?tab=readme-ov-file Are these good for my ends? Thanks a lot for the help! submitted by /u/NoApparentReason256 [link] [comments]